Graduate Econometrics Recitations: 2021-2022
Alex Houtz
March 24, 2023
Graduate Teaching Assistant | University of Notre Dame
Contents
Preface 4
1 Introduction to Matlab 7
1.1 Matlab Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Matlab Set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Some Useful Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 General Advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Univariate Probability 11
2.1 Monte Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Practice Problem 1: Hansen 2.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Practice Problem 2: Hansen 2.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Practice Problem 3: Hansen 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Practice Problem 4: Hansen 3.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Multivariate Probability 19
3.1 Expected Value of a Log-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Practice Problem 1: Hansen 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Practice Problem 2: Hansen 4.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Practice Problem 3: Multivariate Change of Variables . . . . . . . . . . . . . . . . . . . 23
4 Dependent Random Variables 27
4.1 Previous Problem: Hansen 4.14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Practice Problem 1: Hansen 4.19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Practice Problem 2: AR(2) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Practice Problem 3: Deriving the MA() Form for an AR(1) . . . . . . . . . . . . . . 33
5 Dependent Vectors of Random Variables 35
5.1 Previous Problem 1: Hansen 5.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Previous Problem 2: Deriving the F Distribution . . . . . . . . . . . . . . . . . . . . . 36
5.3 Practice Problem 1: Companion Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 Matlab Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6 Bias and Consistency 45
2
CONTENTS 3
6.1 The Analogy Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.3 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.4 Midterm Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 Asymptotics 53
7.1 Previous Problem: Gaussian Autoregressive Process . . . . . . . . . . . . . . . . . . . . 53
7.2 Practice Problem 1: Hansen 8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3 Practice Problem 2: Hansen 8.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4 Practice Problem 3: Hansen 8.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.5 Matlab Checkpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8 Maximum Likelihood Estimation 63
8.1 Maximum Likelihood Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2 Practice Problem 1: Hansen 10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.3 Practice Problem 2: Hansen 10.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.4 Practice Problem 3: Hansen 10.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
8.5 Matlab Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9 Method of Moments 73
9.1 Previous Problem 1: Matlab Theory Part . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.2 Method of Moments Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.3 Practice Problem 1: Hansen 11.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.4 Practice Problem 2: Hansen 11.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.5 Matlab Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
10 Hypothesis Testing and Confidence Intervals 83
10.1 Previous Problem 1: OLS Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.3 Other Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.4 Practice Problem 1: Hansen 13.3 Extended . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.5 Some Matlab Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
11 Basics of Regression 91
11.1 Previous Problem 1: Hansen 13.1 Extended . . . . . . . . . . . . . . . . . . . . . . . . 91
11.2 Regression Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
11.3 Matlab Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
11.4 Previous Problem 2: Hansen II 3.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11.5 Practice Problem 1: Hansen II 4.16 Adapted . . . . . . . . . . . . . . . . . . . . . . . . 104
11.6 Practice Problem 2: Hansen II 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.7 Practice Problem 3: Hansen II 7.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
11.8 Practice Problem 4: Hansen II 7.23 Adapted . . . . . . . . . . . . . . . . . . . . . . . . 108
4 CONTENTS
12 Summary of Econometrics I 111
12.1 Previous Problem: Hansen II 7.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
12.2 Maximum Likelihood Estimation: Hansen 10.8 . . . . . . . . . . . . . . . . . . . . . . . 113
12.3 Method of Moments Estimation: Hansen 11.3 . . . . . . . . . . . . . . . . . . . . . . . 116
12.4 Regression Tests: Hansen 13.3 Extended . . . . . . . . . . . . . . . . . . . . . . . . . . 117
12.5 Final . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Preface
This book is a compilation of recitations given throughout the 2021-22 academic year at the University
of Notre Dame for the first-year PhD econometrics sequence in the Department of Economics. They
were originally composed by Alex Houtz, who was the graduate teaching assistant for Drew Creal in
Fall 2021 and Marinho Bertanha in Spring 2022. The materials within may be distributed at any
time to any audience for individual or instructional use, given proper citation to the author. These
materials may not be used for commercial purposes.
Many problems were taken or adapted from Bruce Hansen’s two econometrics books, which can
be found here. The first book is referred to as simply “Hansen” while the second book is referred to
as “Hansen II.” Other textbooks that are drawn from include Wooldridge (found here) and Hayashi
(found here). The remaining problems are either cited in-text or are taken from the homework given
in the lecture class by either Drew Creal or Marinho Bertanha.
The following list documents the chain of teaching assistants at the University of Notre Dame that
used and/or edited this book, from earliest to latest:
1. Alex Houtz (2021-2022)
5
6 CONTENTS
Chapter 1
Introduction to Matlab
1.1 Matlab Basics
We need to begin with the structure of Matlab.
The "Editor" window (upper right) is where we create the code file. Your saved work will come
from here.
The "Command" window (bottom right) contains output from your code. You can also directly
input code here that you don’t want saved.
The "Current Folder" window (upper left) contains the files specified in the path I have declared
(see below in Matlab Set-up).
7
8 CHAPTER 1. INTRODUCTION TO MATLAB
The "Workspace" window (bottom left) contains saved variables, matrices, arrays, etc.
1.2 Matlab Set-up
There will be many Matlab assignments throughout the year. Establishing a simple, clean set-up to
your code will make all of our lives easier. Feel free to experiment and create your own, but here is an
example:
There are a couple "tricks" to note:
The double percent signs, "%%", create new sections. So in the example, I have the section
"Introduction" and the section "Settings".
The single percent sign, "%", tells Matlab that the following line is a comment and not code.
The "clear, clc" commands clear the workspace and therefore ensure I’m not overwriting other
variables in other files.
The "cd(...)" command specifies the path on which Matlab will look for data and function files.
The semi-colon, ";" suppresses output in the command window.
1.3 Some Useful Commands
1.4. PUBLISHING 9
Variables must be defined for values, function outputs, and data if you want to reference them
later in your code.
To import data, declare your path then read the data into Matlab using "readmatrix" or
"readtable". See documentation for more details on these commands.
For loops will be useful in most Matlab assignments. In the example above, I’m creating two
error term vectors with a multivariate normal distribution. Essentially, Matlab is pulling two
random numbers from the distribution 10,000 times. NOTE: Try to avoid triple loops if possible.
These can take a long time to run.
1.4 Publishing
Your code needs to be able to run from the beginning to the end with no problems. One easy way to
check this, and also to turn in your code and output, is to publish your code.
Usually we work in the Editor tab. To publish, click the Publish tab and then click on Publish.
When the preview pops up, print to PDF. The end result should look similar to the picture below
Make sure the necessary output is visible in the published PDF. Remember that by using a
semi-colon you suppress the output.
10 CHAPTER 1. INTRODUCTION TO MATLAB
1.5 General Advice
Coding is hard. Do not worry if you are struggling. Find a friend to code with and work through
these together (do make sure you understand what the code is doing).
Try to reduce run times for your code. Most assignments should be able to run in less than a
minute.
The internet is your best friend for Matlab (and code in general). Check out places like Matlab
Answers first.
Make sure to save your work.
Chapter 2
Univariate Probability
2.1 Monte Hall
This problem is important. Last year we had a question on the midterm using the same principles
found in the Monte Hall problem. In light of this information, I will lay out the reasoning for this
problem and solve it below. Make sure you know how to solve this question and any like it.
You are on the game show "Let’s Make a Deal with Monte Hall." Without loss of generality, you
pick door A but it is not immediately opened. To increase the drama, Monte opens one of the two
remaining doors, door B (again, without loss of generality) revealing that door B does not hide the
grand prize. Monte makes you an offer: Either switch your door choice to C or keep door A. Should
you switch doors?
Solution: We begin by first defining variables.
P Door hiding the prize
S Door selected by the player
H Door opened by the host
We begin with the assumption that, WLOG, you select door A. As such, the probabilities for the
location of the prize are:
P(P = 1|S = 1) =
1
3
P(P = 2|S = 1) =
1
3
P(P = 3|S = 1) =
1
3
Using Bayes’ Rule, we know that
11
12 CHAPTER 2. UNIVARIATE PROBABILITY
P(P = 1|H = 2, S = 1) =
P(H = 2|P = 1, S = 1)P(P = 1|S = 1)
P(H = 2|S = 1)
and
P(P = 3|H = 2, S = 1) =
P(H = 2|P = 3, S = 1)P(P = 3|S = 1)
P(H = 2|S = 1)
Note that we are missing the components of the total probability of P(H = 2|S = 1). But we know
these from your perspective:
P(H = 2|P = 1, S = 1) =
1
2
P(H = 2|P = 2, S = 1) = 0
P(H = 2|P = 3, S = 1) = 1
We can now compute the probabilities we want:
P(P = 1|H = 2, S = 1) =
P(H = 2|P = 1, S = 1)P(P = 1|S = 1)
P(H = 2|S = 1)
=
1
2
×
1
3
1
2
×
1
3
+ 0 ×
1
3
+ 1 ×
1
3
=
1
2
1
2
+ 1
=
1
3
Where P(H = 2|S = 1) =
P
3
i=1
P(H = 2|P = i, S = 1)P(P = i|S = 1) via the law of total
probability.
Applying the same steps:
2.2. PRACTICE PROBLEM 1: HANSEN 2.7 13
P(P = 3|H = 2, S = 1) =
P(H = 2|P = 3, S = 1)P(P = 3|S = 1)
P(H = 2|S = 1)
=
1 ×
1
3
1
2
×
1
3
+ 0 ×
1
3
+ 1 ×
1
3
=
1
3
1
6
+
1
3
=
2
3
Therefore, the probability the prize is behind your first choice, door A, is
1
3
while the probability
the prize is behind your alternative choice, door C, is
2
3
. Given that you want to win, you should
switch your door choice to C.
2.2 Practice Problem 1: Hansen 2.7
We are given the chi-square distribution. We are tasked with deriving the pdf for the inverse chi-square
distribution.The pdf and change equation are given below:
f
x
(x) =
1
2
r/2
Γ(
r
2
)
x
r/21
e
x/2
Y = 1/X
Therefore, we need the change of variables formula:
f
y
(y) =
g
1
(y)
y
f
x
(g
1
(y))
First we find g
1
(y). We know g(x) = Y = 1/X. So g
1
(y) = 1/Y .
Then we plug this into the chi-square pdf:
f
x
(g
1
(y)) =
1
2
r/2
Γ(
r
2
)
1
y
r/21
e
1/(2y)
Finding the Jacobian next:
14 CHAPTER 2. UNIVARIATE PROBABILITY
g
1
(y)
y
=
1
y
2
=
1
y
2
Lastly, putting it all together:
f
y
(y) = y
2
1
2
r/2
Γ(
r
2
)
y
r/2+1
e
1/(2y)
=
1
2
r/2
Γ(
r
2
)
y
r/21
e
1/(2y)
2.3 Practice Problem 2: Hansen 2.11
Suppose X has density f (x) = e
x
. Set Y = ln(X) and find f
y
(y).
To solve, we follow the same procedure as before.
First we find g
1
(y). We know g(x) = Y = ln(X). So g
1
(y) = e
y
.
Plugging this into the pdf:
f
x
(g
1
(y)) = e
e
y
Then finding the Jacobian:
g
1
(y)
y
=
e
y
= e
y
Now altogether:
f
y
(y) = e
y
e
e
y
2.4 Practice Problem 3: Hansen 3.5
For the exponential distribution, show:
(a)
R
0
f(x|λ)dx = 1
(b) E[X] = λ
2.4. PRACTICE PROBLEM 3: HANSEN 3.5 15
(c) var(X) = λ
2
2.4.1 Part a
Hansen defines the exponential distribution as follows:
f
x
(x) =
1
λ
e
1
λ
Plug this into the integral and solve:
Z
0
1
λ
e
x
λ
dx =
1
λ
Z
0
e
x
λ
dx
=
1
λ
h
λe
x
λ
i
0
=
1
λ
[λ(0 1)]
=
λ
λ
= 1
2.4.2 Part b
Using the definition of the expected value E[X] =
R
−∞
xf
x
(x)dx:
E[X] =
Z
0
x
λ
e
x
λ
dx
=
1
λ
Z
0
xe
x
λ
dx
We need to do integration by parts. So by defining:
u = x
du = dx
dv = e
x
λ
dx
v = λe
x
λ
we can substitute into the integration by parts formula:
Z
udv = uv
Z
vdu
Now we have
1
λ
[xλe
x
λ
]
0
+ λ
Z
0
e
x
λ
dx
16 CHAPTER 2. UNIVARIATE PROBABILITY
We use L’Hopital’s Rule to evaluate the first term at the upper limit:
lim
x→∞
xλe
x
λ
= lim
x→∞
e
x
λ
= lim
x→∞
1
e
x
λ
This puts the expression in an / form. Taking the derivative of the top and bottom:
lim
x→∞
λ
1
λ
e
x
λ
= 0
Back to the problem:
1
λ
[xλe
x
λ
]
0
+ λ
Z
0
e
x
λ
dx
=
1
λ
n
[0 0] + λ[λe
x
λ
]
0
o
=
λ
2
λ
= λ
2.4.3 Part c
Recall the variance decomposition formula: Var(X) = E[X
2
] E[X]
2
. We know E[X], so all we need
is E[X
2
]. The steps are very similar as above:
E[X
2
] =
1
λ
Z
0
x
2
e
x
λ
dx
We need to do integration by parts again. Define
u = x
2
du = 2xdx
dv = e
x
λ
dx
v = λe
x
λ
Continuing on by plugging these into the integration by parts formula:
=
1
λ
[λx
2
e
x
λ
]
0
+ 2λ
Z
0
xe
x
λ
dx
We can use L’Hopital’s Rule to show that the first term goes to 0 again. Even better, the second
term is the same term we solved when finding E[X]! Substituting everything in:
2.5. PRACTICE PROBLEM 4: HANSEN 3.9 17
=
1
λ
[0 + 2λ
3
]
= 2λ
2
Plugging E[X
2
] and E[X] into the variance decomposition formula, we get:
Var(X) = E[X
2
] E[X]
2
= 2λ
2
λ
2
= λ
2
2.5 Practice Problem 4: Hansen 3.9
For the gamma distribution, show:
(a)
R
0
f(x|α, β)dx = 1
(b) E[X] =
α
β
(c) var(X) =
α
β
2
2.5.1 Part a
We begin by noting that the pdf of a gamma distribution is f
x
(x|α, β) =
β
α
Γ(α)
x
α1
e
βx
. Subbing this
into the integral:
Z
0
β
α
Γ(α)
x
α1
e
βx
dx
Using property 3 in Hansen Appendix A.28
R
0
x
α1
e
λx
dx =
Γ(α)
λ
α
, we get:
=
β
α
Γ(α)
·
Γ(α)
β
α
= 1
2.5.2 Part b
Use the definition for the expected value of a random variable:
18 CHAPTER 2. UNIVARIATE PROBABILITY
E[X] =
Z
0
β
α
Γ(α)
x
α
e
βx
dx
=
β
α
Γ(α)
·
Γ(α + 1)
β
α+1
(P rop. 3)
=
β
α
Γ(α)
·
αΓ(α)
β
α+1
(P rop. 2)
=
α
β
2.5.3 Part c
Using the variance decomposition equation, we again only need E[X
2
]. So:
E[X
2
] =
Z
0
β
α
Γ(α)
x
α+1
e
βx
dx
=
β
α
(α)
·
Γ(α + 2)
β
α+2
(P rop. 3)
=
β
α
(α)
·
(α + 1)Γ(α + 1)
β
α+2
(P rop. 2)
=
β
α
(α)
·
(α + 1)(α)Γ(α)
β
α+2
(P rop. 2)
=
α
2
+ α
β
2
Substitute into the variance decomposition formula:
Var(X) =
α
2
+ α
β
2
α
2
β
2
=
α
β
2
Chapter 3
Multivariate Probability
3.1 Expected Value of a Log-Normal Distribution
Given that the transformation of a variable, Y N(µ, σ
2
), given by X = e
Y
results in a log-normal
distribution, find the expected value of X.
3.1.1 Solution:
Note that E[X] = E[e
y
]. As a result, we can simply use the normal distribution instead of a log-normal
distribution:
E[X] = E[e
y
]
=
Z
−∞
e
y
2πσ
2
e
(yµ)
2
2σ
2
dy
=
Z
−∞
1
2πσ
2
e
(yµ)
2
2σ
2
+y
dy
=
Z
−∞
1
2πσ
2
e
y
2
+2µ
2
+2
2
2σ
2
dy
=
Z
−∞
1
2πσ
2
e
(y
2
2(µ+σ
2
)y+µ
2
)
2σ
2
dy
So far this is just algebra. We now complete the square inside the exponential. We want the
end result to look like:
y (µ + σ
2
)
2
. To get this, we need to know what we should add to our
expression. Multiplying out gives us: y
2
2(µ + σ
2
)y + µ
2
+ 2σ
2
µ + σ
4
. Therefore we must multiply
by e
2σ
2
µ+σ
4
2σ
2
µσ
4
2σ
2
. This gives us:
19
20 CHAPTER 3. MULTIVARIATE PROBABILITY
=
Z
−∞
1
2πσ
2
e
(y(µ+σ
2
))
2
+2σ
2
µ+σ
4
2σ
2
dy
= e
µ+
1
2
σ
2
Z
−∞
1
2πσ
2
e
(y(µ+σ
2
))
2
2σ
2
dy
But note that the part leftover inside the integral is just a normal pdf! We know that this integrates
to 1, which leaves us with:
= e
µ+
1
2
σ
2
3.2 Practice Problem 1: Hansen 4.1
Let f(x, y) = 1/4 for 1 x 1 and 1 y 1.
(a) Verify that f (x, y) is a valid density function
(b) Find the marginal density of X
(c) Find the conditional density of Y given X = x
(d) Find E[Y |X = x]
(e) Determine P(X
2
+ Y
2
1)
(f) Determine P(|X + Y | 2)
3.2.1 Solution: Part a
We need to show that the joint pdf integrates to 1. So:
Z
1
1
Z
1
1
1
4
dydx =
Z
1
1
1
4
1
1
dx
=
Z
1
1
1
2
dx
=
1
2
x
1
1
=
1
2
+
1
2
= 1
3.2. PRACTICE PROBLEM 1: HANSEN 4.1 21
3.2.2 Solution: Part b
To find the marginal distribution for X, we simply integrate out Y.
f
x
(x) =
Z
1
1
1
4
dy
=
1
4
y
1
1
=
1
4
+
1
4
=
1
2
Note that due to symmetry, f
y
(y) is the same as f
x
(x).
3.2.3 Solution: Part c
We use the conditional density formula: f
y|x
(y|x) =
f(x,y)
f(x)
f
y|x
(y|x) =
1/4
1/2
=
1
2
3.2.4 Solution: Part d
Recall that E[Y |X = x] =
R
Y
yf(y|x)dy. Applying this formula:
E[Y |X = x] =
Z
1
1
1
2
ydy
=
1
4
y
2
1
1
=
1
4
1
4
= 0
This is the same as E[Y ]. Why? Because f
y|x
(y|x) = f
y
(y), meaning that X Y .
3.2.5 Solution: Part e
There are two ways to do this problem. I will use integrals for (e) and solve (f) graphically.
22 CHAPTER 3. MULTIVARIATE PROBABILITY
P(x
2
+ y
2
1) = P(x
2
1 y
2
)
= P(
p
1 y
2
x
p
1 y
2
)
=
Z
1
1
Z
1y
2
1y
2
1
4
dxdy
=
Z
1
1
p
1 y
2
2
dy using an integral calculator
=
"
arcsin(y) + y
p
1 y
2
4
#
1
1
=
π
4
Graphically, we can calculate the area of the square from the support of X and Y and the area of
the circle enclosed by X
2
+ Y
2
1. Doing this yields an area of 4 for the square and an area of π for
the circle. Normalizing the space to an area of 1 by dividing by 4 leaves us with a probability equal to
π
4
confirming our integration solution.
3.2.6 Solution: Part f
As stated above, we will solve this graphically as solving via integration is tricky. Drawing a picture
reveals the simplicity of this problem:
Where the shaded red area is every combination of X and Y such that their sum is less than two.
Since our domain for each variable is between -1 and 1, P(|X + Y | 2) = 1.
3.3. PRACTICE PROBLEM 2: HANSEN 4.7 23
3.3 Practice Problem 2: Hansen 4.7
Let X and Y have joint density f(x, y) = e
xy
for x > 0 and y > 0. Find f
x
(x) and f
y
(y) to
determine if X Y.
3.3.1 Solution:
To find the marginal densities, we integrate out the other variable. So:
f
x
(x) =
Z
0
e
xy
dy
=
e
xy
0
= 0 + e
x
= e
x
Similarly:
f
y
(y) =
Z
0
e
xy
dx
=
e
xy
0
= 0 + e
y
= e
y
Recall that X and Y are if f
x
(x) ·f
y
(y) = f(x, y).
f
x
(x) ·f
y
(y) = e
x
· e
y
= e
xy
= f(x, y)
So X and Y are .
3.4 Practice Problem 3: Multivariate Change of Variables
Using the joint pdf from the previous problem (f(x, y) = e
xy
), apply the two transformations
Z = X Y and W = X + Y . Find the marginal distributions for Z and W .
1
1
Credit to Penn State
24 CHAPTER 3. MULTIVARIATE PROBABILITY
3.4.1 Solution: Finding the inverse functions
We begin with solving for X. Rearranging the transformation equations, we find that Z + Y = X and
W Y = X. Setting these equal:
Z + Y = W Y
2Y = W Z
Y =
W Z
2
Similarly, by solving for Y, we find that:
Z + X = W X
2X = W + Z
X =
W + Z
2
3.4.2 Solution: Plug inverse functions into the joint pdf
f
xy
(x, y) = e
xy
f(g
1
(z, w), h
1
(z, w)) = e
wz
2
+
zw
2
= e
2w
2
= e
w
3.4.3 Solution: The Jacobian
The Jacobian is the same as always:
|J| =
x
z
x
w
y
z
y
w
=
1
2
1
2
1
2
1
2
=
1
4
+
1
4
=
1
2
3.4.4 Solution: Substitute into the change of variables formula
Using the same formula as before
3.4. PRACTICE PROBLEM 3: MULTIVARIATE CHANGE OF VARIABLES 25
f
zw
(z, w) = e
w
·
1
2
=
e
w
2
Now we have our joint pdf!
3.4.5 Solution: Finding the marginal pdfs
First we need to figure out the support for W and Z. We know that:
0 < x <
0 <
w + z
2
<
0 < w + z <
0 < y <
0 <
w z
2
<
0 < w z <
From here we can determine that
w < z < w 0 < w <
We are ready to find the marginal distribution for W:
f
w
(w) =
Z
w
w
e
w
2
dz
=
1
2
e
w
z
w
w
=
1
2
e
w
w +
1
2
e
w
w
= we
w
for 0 < w <
One more to go! We need to re-evaluate the support before we can continue. We can write the
support as:
z < w < z < w < with −∞ < z <
26 CHAPTER 3. MULTIVARIATE PROBABILITY
So:
f
z
(z) =
R
z
e
w
2
dw for z < w <
R
z
e
w
2
dw for z < w <
=
1
2
e
w
z
for z < 0
1
2
e
w
z
for z > 0
=
1
2
e
z
for z < 0
1
2
e
z
for z > 0
If we combine these two cases, we arrive at our answer:
f
z
(z) =
1
2
e
−|z|
This is the Laplace distribution’s pdf with a location parameter of zero and a scale parameter of 1.
Chapter 4
Dependent Random Variables
4.1 Previous Problem: Hansen 4.14
Let X
1
Γ(r, 1) and X
2
Γ(s, 1) be independent. Find the distribution of Y = X
1
+ X
2
.
4.1.1 Solution:
Many of you did change of variables on the last homework. That approach works well and should
be your first instinct. Using change of variables here, though, is not particularly elegant. A more
efficient method of arriving at the answer is via the moment generating function of the gamma random
variable. We begin by finding that:
M(t) = E[e
tx
] where X Γ(α, β)
=
Z
0
e
tx
1
Γ(α)
β
α
x
α1
e
x
β
dx
=
1
Γ(α)
β
α
Z
0
x
α1
e
x
β
+tx
dx
=
1
Γ(α)
β
α
Z
0
x
α1
e
[
1
β
t
]
x
dx
Note that we can use the gamma function’s properties to evaluate the inside of the integral. Using
property 3 in Hansen:
=
1
Γ(α)
β
α
· Γ(α)
β
1 βt
α
=
1
1 βt
α
Okay, now that we know the MGF for a gamma random variable (which we could just look up on
the web), we can begin to solve. We start with finding the MGF for Y :
27
28 CHAPTER 4. DEPENDENT RANDOM VARIABLES
E[e
ty
] = E[e
t(x
1
+x
2
)
]
= E[e
tx
1
+tx
2
]
= E[e
tx
1
e
tx
2
] Now, due to independence:
= E[e
tx
1
]E[e
tx
2
]
=
1
1 t
r
·
1
1 t
s
=
1
1 t
r+s
But this should look familiar! After all, it is just the MGF for a gamma random variable with a
shape parameter of r + s. Therefore Y Γ(r + s, 1)
4.2 Practice Problem 1: Hansen 4.19
Consider the hierarchical distribution
X|N χ
2
2N
N P oisson(λ)
Find
(a) E[X]
(b) V ar(X)
4.2.1 Solution: Part a
We use the law of iterated expectations: E[X] = E
Y
E[X|Y ]
E[X] = E
N
E[X|N]
= E[2N]
= 2λ
4.2.2 Solution: Part b
We use the law of total variance: V ar(X) = E[V ar(X|Y )] + V ar(E[X|Y ])
4.3. PRACTICE PROBLEM 2: AR(2) PROCESS 29
V ar(X) = E[V ar(X|N )] + V ar(E[X|N])
= E[4N] + V ar(2N)
= 4λ + 4V ar(N )
= 4λ + 4λ
= 8λ
4.3 Practice Problem 2: AR(2) Process
An AR(2) process relates a random variable located in time period t to two lags of that random
variable. We usually write this as: y
t
= d + ϕ
1
y
t1
+ ϕ
2
y
t2
+ ε
t
where ε
t
W N(0, σ
2
). Recall that
white noise implies that ϵ
t
has no autocorrelation but does not imply that ε
t
is not independent from
its past.
(a) Find E[y
t
]
(b) Find V ar(y
t
)
(c) Find γ
0
, γ
1
, and γ
2
(d) Find the impulse responses for a shock ε
t
for k 0, 1, 2, 3, 4
(e) Find E[y
t+3
|t]
4.3.1 Solution: Part a
We just take the expectation of the AR(2) equation:
E[y
t
] = E[d + ϕ
1
y
t1
+ ϕ
2
y
t2
+ ε
t
]
µ
y
= d + ϕ
1
E[y
t1
] + ϕ
2
E[y
t2
] + 0
µ
y
= d + (ϕ
1
+ ϕ
2
)µ
y
µ
y
(1 ϕ
1
ϕ
2
) = d
µ
y
=
d
1 ϕ
1
ϕ
2
4.3.2 Solution: Part b
Similarly, we take the variance of the AR(2) equation:
30 CHAPTER 4. DEPENDENT RANDOM VARIABLES
V ar(y
t
) = V ar(d + ϕ
1
y
t1
+ ϕ
2
y
t2
+ ε
t
)
γ
0
= ϕ
2
1
V ar(y
t1
) + ϕ
2
2
V ar(y
t2
) + 2ϕ
1
ϕ
2
Cov(y
t1
, y
t2
) + σ
2
γ
0
= (ϕ
2
1
+ ϕ
2
2
)γ
0
+ 2ϕ
1
ϕ
2
γ
1
+ σ
2
4.3.3 Solution: Part c
In class, Drew proved that the autocorrelation function for an AR(2) is given by: ρ
j
= ϕ
1
ρ
j1
+ϕ
2
ρ
j2
.
Using this equation we can find what we need. First, let’s find γ
1
:
γ
1
= ϕ
1
γ
0
+ ϕ
2
γ
1
Recall that γ
i
= γ
i
γ
1
= ϕ
1
γ
0
+ ϕ
2
γ
1
γ
1
=
ϕ
1
γ
0
1 ϕ
2
Plugging this into the equation we found for γ
0
in part a:
γ
0
= (ϕ
2
1
+ ϕ
2
2
)γ
0
+ 2ϕ
1
ϕ
2
ϕ
1
γ
0
1 ϕ
2
+ σ
2
γ
0
(1 ϕ
2
1
ϕ
2
2
2ϕ
1
ϕ
2
ϕ
1
1 ϕ
2
) = σ
2
γ
0
=
σ
2
1 ϕ
2
1
ϕ
2
2
2ϕ
1
ϕ
2
ϕ
1
1ϕ
2
Plug this into the expression for γ
1
:
γ
1
=
ϕ
1
γ
0
1 ϕ
2
γ
1
=
ϕ
1
1 ϕ
2
"
σ
2
1 ϕ
2
1
ϕ
2
2
2ϕ
1
ϕ
2
ϕ
1
1ϕ
2
#
Lastly, use the ACF to find γ
2
:
γ
2
= ϕ
1
γ
1
+ ϕ
2
γ
0
γ
2
=
ϕ
2
1
1 ϕ
2
"
σ
2
1 ϕ
2
1
ϕ
2
2
2ϕ
1
ϕ
2
ϕ
1
1ϕ
2
#
+
ϕ
2
σ
2
1 ϕ
2
1
ϕ
2
2
2ϕ
1
ϕ
2
ϕ
1
1ϕ
2
What if we did not have the ACF provided? We could always calculate γ
0
, γ
1
, and γ
2
by brute
force. First, we rewrite the AR(2):
4.3. PRACTICE PROBLEM 2: AR(2) PROCESS 31
y
t
= d + ϕ
1
y
t1
+ ϕ
2
y
t2
+ ε
t
y
t
= µ
y
(1 ϕ
1
ϕ
2
) + ϕ
1
y
t1
+ ϕ
2
y
t2
+ ε
t
(y
t
µ
y
) = ϕ
1
(y
t1
µ
y
) + ϕ
2
(y
t2
µ
y
) + ε
t
If we wanted to calculate the unconditional variance γ
0
, for example, then:
(y
t
µ
y
)
2
= [ϕ
1
(y
t1
µ
y
) + ϕ
2
(y
t2
µ
y
) + ε
t
] (y
t
µ
y
)
E[(y
t
µ
y
)
2
] = E[(ϕ
1
(y
t1
µ
y
) + ϕ
2
(y
t2
µ
y
) + ε
t
) (y
t
µ
y
)]
= γ
0
What if we wanted to calculate γ
1
?
(y
t
µ
y
)(y
t1
µ
y
) = [ϕ
1
(y
t1
µ
y
) + ϕ
2
(y
t2
µ
y
) + ε
t
] (y
t1
µ
y
)
E[(y
t
µ
y
)(y
t1
µ
y
)] = E[(ϕ
1
(y
t1
µ
y
) + ϕ
2
(y
t2
µ
y
) + ε
t
) (y
t1
µ
y
)]
= γ
1
Similarly for γ
2
:
(y
t
µ
y
)(y
t2
µ
y
) = [ϕ
1
(y
t1
µ
y
) + ϕ
2
(y
t2
µ
y
) + ε
t
] (y
t1
µ
y
)
E[(y
t
µ
y
)(y
t2
µ
y
)] = E[(ϕ
1
(y
t1
µ
y
) + ϕ
2
(y
t2
µ
y
) + ε
t
) (y
t2
µ
y
)]
= γ
2
4.3.4 Solution: Part d
Impulses are defined as
y
t+k
ε
t
. Let’s find this for k = 0 first:
y
t
= d + ϕ
1
y
t1
+ ϕ
2
y
t2
+ ε
t
y
t
ε
t
= 1
Now for k = 1:
32 CHAPTER 4. DEPENDENT RANDOM VARIABLES
y
t+1
= d + ϕ
1
y
t
+ ϕ
2
y
t1
+ ε
t+1
y
t+1
= d + ϕ
1
(d + ϕ
1
y
t1
+ ϕ
2
y
t2
+ ε
t
) + ϕ
2
y
t1
+ ε
t+1
y
t+1
ε
t
= ϕ
1
For k = 2:
y
t+2
= d + ϕ
1
y
t+1
+ ϕ
2
y
t
+ ε
t+2
y
t+2
ε
t
= ϕ
1
·
ε
t
y
t+1
+ ϕ
2
·
ε
t
y
t
= ϕ
2
1
+ ϕ
2
And for k = 3:
y
t+3
= d + ϕ
1
y
t+2
+ ϕ
2
y
t+1
+ ε
t+3
y
t+3
ε
t
= ϕ
1
·
ε
t
y
t+2
+ ϕ
2
·
ε
t
y
t+1
= ϕ
3
1
+ 2ϕ
1
ϕ
2
Finally for k = 4:
y
t+4
= d + ϕ
1
y
t+3
+ ϕ
2
y
t+2
+ ε
t+4
y
t+4
ε
t
= ϕ
1
·
ε
t
y
t+3
+ ϕ
2
·
ε
t
y
t+2
= ϕ
4
1
+ 3ϕ
2
1
ϕ
2
+ ϕ
2
2
As you can see, calculating impulse responses for an AR(2) by hand is much more labor-intensive
than calculating them for an AR(1).
4.3.5 Solution: Part e
This is similar to calculating the unconditional moment. Now, though, we know all the variables up
to time t. So:
4.4. PRACTICE PROBLEM 3: DERIVING THE MA() FORM FOR AN AR(1) 33
E[y
t+3
|t] = E[d + ϕ
1
y
t+2
+ ϕ
2
y
t+1
+ ε
t+3
|t]
= d + ϕ
1
E[y
t+2
|t] + ϕ
2
E[y
t+1
|t] + 0
= d + ϕ
1
E[d + ϕ
1
y
t+1
+ ϕ
2
y
t
+ ε
t+2
|t] + ϕ
2
E[d + ϕ
1
y
t
+ ϕ
2
y
t1
+ ε
t+1
|t]
= d + ϕ
1
(d + ϕ
2
y
t
+ ϕ
1
E[d + ϕ
1
y
t
+ ϕ
2
y
t1
+ ε
t+1
|t]) + ϕ
2
(d + ϕ
1
y
t
+ ϕ
2
y
t1
)
= d + ϕ
1
d + ϕ
1
ϕ
2
y
t
+ ϕ
2
1
d + ϕ
3
1
y
t
+ ϕ
2
1
ϕ
2
y
t1
+ ϕ
2
2
y
t1
+ ϕ
2
d + ϕ
2
ϕ
1
y
t
+ ϕ
2
2
y
t1
= d(1 + ϕ
1
+ ϕ
2
1
+ ϕ
2
) + y
t
(ϕ
3
1
+ 2ϕ
1
ϕ
2
) + y
t1
(ϕ
2
1
ϕ
2
+ ϕ
2
2
)
Notice how we treated any variable dated at time t or before as if they were constants. That’s the
key to solving conditional expectations or variances with respect to time.
4.4 Practice Problem 3: Deriving the MA() Form for an
AR(1)
Drew went over this in class, but this is an important wrench to have in your toolbox. The MA()
form splits your autoregressive process into the stationary mean and sum of impulse shocks. We start
with a basic AR(1):
y
t
= d + ϕy
t1
+ ε
t
Recursively sub-in:
= d + ϕ(d + ϕy
t2
+ ε
t1
) + ε
t
= d(1 + ϕ) + ϕ
2
y
t2
+ ϕε
t1
+ ε
t
= d(1 + ϕ) + ϕ
2
(d + ϕy
t3
+ ε
t2
) + ϕε
t1
+ ε
t
= d(1 + ϕ + ϕ
2
) + ϕ
3
y
t3
+ ϕ
2
ε
t2
+ ϕε
t1
+ ε
t
= d(1 + ϕ + ϕ
2
) + ϕ
3
(d + ϕy
t4
+ ε
t3
) + ϕ
2
ε
t2
+ ϕε
t1
+ ε
t
= d(1 + ϕ + ϕ
2
+ ϕ
3
) + ϕ
4
y
t4
+ ϕ
3
ε
t3
+ ϕ
2
ε
t2
+ ϕε
t1
+ ε
t
We see a pattern emerging here. Using induction, we can write this process as:
= d
X
i=1
ϕ
i1
+
X
j=0
ϕ
j
ε
tj
Using the fact that
P
k=0
ar
k
=
a
1r
, and making the assumption that ϕ < 1, this becomes:
34 CHAPTER 4. DEPENDENT RANDOM VARIABLES
=
d
1 ϕ
+
X
j=0
ϕ
j
ε
tj
We can simplify further. Solving for the mean of an AR(1) process:
E[y
t
] = E[d + ϕy
t1
+ ε
t
]
µ
y
= d + ϕE[y
t1
] + 0
µ
y
= d + ϕµ
y
µ
y
=
d
1 ϕ
Therefore, the MA() form becomes:
y
t
= µ
y
+
X
j=0
ϕ
j
ε
tj
Where µ
y
is the unconditional mean of Y and the sum contains the impulse shocks of ε.
Chapter 5
Dependent Vectors of Random
Variables
5.1 Previous Problem 1: Hansen 5.12
Show that the MGF of X N(µ, Σ) R
m
is
M(t) = E[e
t
T
X
]
= e
t
T
µ+
1
2
t
T
Σt
5.1.1 Solution:
We start from the definition of the moment generating function:
M
x
(t) = E[e
tx
]
=
1
2π
Σ
1/2
Z
−∞
e
t
T
x
· e
1
2
(xµ)
T
Σ
1
(xµ)
dx
=
1
2π
Σ
1/2
Z
−∞
e
1
2
(
x
T
Σ
1
xµ
T
Σ
1
xx
T
Σ
1
µ+µ
T
Σ
1
µ
)
+t
T
x
dx
=
1
2π
Σ
1/2
Z
−∞
e
1
2
(
x
T
Σ
1
xµ
T
Σ
1
xx
T
Σ
1
µ+µ
T
Σ
1
µ2t
T
x
)
dx
=
1
2π
Σ
1/2
e
µ
T
Σ
1
µ
2
Z
−∞
e
1
2
(
x
T
Σ
1
x(µ
T
+tΣ)Σ
1
xx
T
Σ
1
µt
T
x
)
dx
=
1
2π
Σ
1/2
e
µ
T
Σ
1
µ
2
Z
−∞
e
1
2
(
x
T
Σ
1
x(µ
T
+tΣ)Σ
1
x(µ
T
Σ
1
x)
T
t
T
x
)
dx
35
36 CHAPTER 5. DEPENDENT VECTORS OF RANDOM VARIABLES
=
1
2π
Σ
1/2
e
µ
T
Σ
1
µ
2
Z
−∞
e
1
2
(
x
T
Σ
1
x(µ
T
+tΣ)Σ
1
x(µ
T
+t
T
Σ)
T
Σ
1
x
)
dx
Now we need to complete the square inside the integrand’s exponential. We can see that we will
want: (x
T
(µ
T
+ tΣ)
T
1
(x (µ + tΣ)). Multiplying this out to see what we need:
(x
T
(µ
T
+ tΣ)
T
1
(x (µ + tΣ)) = x
T
Σ
1
x x
T
Σ
1
(µ + tΣ) (µ
T
+ tΣ)
T
Σ
1
x
+ (µ
T
+ tΣ)
T
Σ
1
(µ
T
+ tΣ)
So we need the last term. Therefore we multiply by:
e
1
2
((µ
T
+tΣ)
T
Σ
1
(µ
T
+tΣ)(µ
T
+tΣ)
T
Σ
1
(µ
T
+tΣ))
Now we have:
=
1
2π
Σ
1/2
e
µ
T
Σ
1
µ
2
e
1
2
((µ
T
+tΣ)
T
Σ
1
(µ
T
+tΣ))
Z
−∞
e
1
2
(x
T
(µ
T
+tΣ)
T
1
(x(µ+tΣ))
dx
= e
1
2
(µ
T
Σ
1
µ+µ
T
Σ
1
µ+2µ
T
t+t
T
t)
= e
µ
T
t+
1
2
t
T
Σt
5.2 Previous Problem 2: Deriving the F Distribution
Define the random variables U and V as:
U χ
2
(k)
V χ
2
(a)
Assume that U and V are independent. Define the random variable F as:
F =
U
k
·
a
V
We will derive the F distribution and expectation.
5.2.1 Solution:
We begin with finding the joint distribution of U and V . Because they are independent, we can
multiply their pdfs together. After doing so and combining like terms, we arrive at:
5.2. PREVIOUS PROBLEM 2: DERIVING THE F DISTRIBUTION 37
f(u, v) =
1
2
k+a
2
Γ(
k
2
)Γ(
a
2
)
u
k
2
1
v
a
2
1
e
(u+v)
2
We now need to do a change of variables. Rearranging the definition of F:
F =
U
k
·
a
V
U =
kF V
a
So we map U to
kF V
a
and V to V . The next step is finding the Jacobian:
|J| =
U
F
U
V
V
F
V
V
=
kV
a
kF
a
0 1
=
kV
a
Plug our inverse function into the joint pdf and multiply by the Jacobian to get (after combining
like terms):
f(F, V ) =
k
a
1
2
k+a
2
Γ(
k
2
)Γ(
a
2
)
kF V
a
k
2
1
V
a
2
e
(
kF V
a
+V )/2
To find the pdf of F, we simply integrate out V:
f(F) =
Z
−∞
k
a
1
2
k+a
2
Γ(
k
2
)Γ(
a
2
)
kF V
a
k
2
1
V
a
2
e
(
kF V
a
+V )/2
dV
=
k
a
1
2
k+a
2
Γ(
k
2
)Γ(
a
2
)
Z
−∞
kF V
a
k
2
1
V
a
2
e
(
kF V
a
+V )/2
dV
=
k
a
1
2
k+a
2
Γ(
k
2
)Γ(
a
2
)
kF
a
k
2
1
Z
−∞
V
a+k
2
1
e
(
kF V
a
+V )/2
dV
Note that this is the kernel of Γ
a+k
2
,
1
2
kF
a
+ 1

. By multiplying and dividing by the appropriate
normalizing constant we can integrate the integrand out to 1, leaving us with:
f(F) =
k
a
Γ(
a+k
2
)
2
k+a
2
Γ(
k
2
)Γ(
a
2
)
kF
a
k
2
1
1
2
·
kF
a
+ 1

(a+k)
2
38 CHAPTER 5. DEPENDENT VECTORS OF RANDOM VARIABLES
Which is the pdf for the F distribution. One last step! Now, we either calculate the expected value
for the F distribution the easy way or the hard way. This problem misleads us into thinking that the
easy way is to use the pdf we just derived. The far easier path though is as follows:
E[F ] = E
U
k
·
a
V
=
a
k
E
U
V
Using the independence of U and V :
=
a
k
E[U]E
1
V
Using the mean of a χ
2
and inverse χ
2
:
=
a
k
(k)
1
a 2
=
a
a 2
5.3 Practice Problem 1: Companion Form
Companion form basically rewrites any linear model (that has finite memory) into a model that looks
like a VAR(1). As such, we can determine the mean, variance, covariance, and whether a linear model
is stationary from this form. In general:
α
t
= d + T α
t1
+
t
is known as the companion form where ε
t
W.N.(0, Q). Structure-wise, α
t
is mx1, d is mx1, T
is mxm, R is mxq, and Q is qxq.
By looking at how the companion form is written, we note that Q is the variance-covariance matrix
of the shocks, T is the matrix placing weights on the right-hand side variables, d is the vector of means,
and R is vector of weights on the shock term. α is the vector of variables in general, leaving out the
last shock lag and/or last variable lag.
The easiest way to demonstrate how companion form can be used is by working through an example.
Let’s look at an ARMA(3,1) model. First, let’s write the model as we are used to seeing it:
y
t
= d + ϕ
1
y
t1
+ ϕ
2
y
t2
+ ϕ
3
y
t3
+ θε
t1
+ ε
t
This should look familiar. Looking at our companion form rules we can begin to build our matrices.
Starting with α
t
:
5.3. PRACTICE PROBLEM 1: COMPANION FORM 39
α
t
=
y
t
y
t1
y
t2
ε
t
Notice how I left out y
t3
and ε
t1
. Next, we build the d vector:
d =
d
0
0
0
That’s not too difficult to build. Now we go onto the T matrix:
T =
ϕ
1
ϕ
2
ϕ
3
θ
1 0 0 0
0 1 0 0
0 0 0 0
and then the α
t1
vector:
α
t1
=
y
t1
y
t2
y
t3
ε
t1
Notice that by multiplying T with α
t1
we get the terms with the coefficients in the ARMA
process. And then, because y
t1
must equal y
t1
and y
t2
must equal y
t2
we place ones in the
appropriate places. Only one more vector to go:
R =
1
0
0
1
Putting all of these together, we get the companion form:
40 CHAPTER 5. DEPENDENT VECTORS OF RANDOM VARIABLES
y
t
y
t1
y
t2
ε
t
=
d
0
0
0
+
ϕ
1
ϕ
2
ϕ
3
θ
1 0 0 0
0 1 0 0
0 0 0 0
y
t1
y
t2
y
t3
ε
t1
+
1
0
0
1
ε
t
Companion forms are not unique. Rather, we can swap rows to create "new" companion forms.
Each variation, if done correctly, should return the same forecasts and unconditional moments as the
others.
I claimed earlier that companion form is useful for testing stationarity. How do we test this? We
want to see if the coefficients on the middle terms lead to exploding y
t
values. To do so, we can check
if the eigenvalues of our slope matrix (T) are < 1.
2
Recall that to check the eigenvalues of a matrix,
we use:
|T λI| = 0
If the eigenvalues (λ) are 1 our process is not stationary.
I also claimed that we can easily find the unconditional moments. Conveniently, if α
t
is stationary,
the unconditional moments follow formulaically:
E[α
t
] = (I T)
1
d
= µ
α
vec(var(α
t
)) = (I
m
2
(T T))
1
vec(RQR
T
)
= vec(Γ
0
)
cov(α
t
, α
tj
) = T
j
Γ
0
= Γ
j
Let’s apply these formulae to our ARMA(3,1) process. First, we solve for the eigenvalues:
2
Technically, this is a necessary but not sufficient condition. In time-series econometrics, we often treat it as sufficient.
For more information, see Carnegie Mellon.
5.3. PRACTICE PROBLEM 1: COMPANION FORM 41
|T λI| =
ϕ
1
ϕ
2
ϕ
3
θ
1 0 0 0
0 1 0 0
0 0 0 0
λ 0 0 0
0 λ 0 0
0 0 λ 0
0 0 0 λ
=
ϕ
1
λ ϕ
2
ϕ
3
θ
1 λ 0 0
0 1 λ 0
0 0 0 λ
Because this is quite large, I use a solver to calculate the determinant and get:
0 = λ(λ(λ(ϕ
1
λ) ϕ
2
) + ϕ
3
)
If the values of the coefficients are such that each root of the equation is, in absolute terms, less
than 1 (on the real or complex unit circle), then the process is stationary. Here, Wolfram Alpha returns
only one eigenvalue: 0. So the process is stationary.
Now let’s solve for the unconditional mean:
E[α
t
] = (I T)
1
d
=
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
ϕ
1
ϕ
2
ϕ
3
θ
1 0 0 0
0 1 0 0
0 0 0 0
1
d
0
0
0
42 CHAPTER 5. DEPENDENT VECTORS OF RANDOM VARIABLES
=
1 ϕ
1
ϕ
2
ϕ
3
θ
1 1 0 0
0 1 1 0
0 0 0 1
1
d
0
0
0
Let P 1ϕ
1
ϕ
2
ϕ
3
=
1
P
1 ϕ
3
+ ϕ
2
ϕ
3
θ
1 1 ϕ
1
ϕ
3
θ
1 1 ϕ
1
1 ϕ
1
ϕ
2
θ
0 0 0 1
d
0
0
0
=
d
P
d
P
d
P
0
This matches all of the results from previous problems! The mean is usually tractable to do by
hand. The variance, though, due to the kronecker product can take up lots of space. A computer can
easily handle the computation though (see Problem Set 5).
5.4 Matlab Help
This week’s Matlab section is hard. Because it’s hard, I’m going to provide you a little guidance so
that you know that you’re on the right track.
First, please use vectorization to generate random data. Do not use loops. The run time for your
code can explode. Here’s an example:
Notice how the variables "e" and "n" are built. Instead of having to loop, Matlab can vectorize
the process.
Here is how your graphs from question four should look:
5.4. MATLAB HELP 43
Here is how your sample ACF and theoretical ACF should approximately line up:
Here is how your VAR(1) ACFs should look:
44 CHAPTER 5. DEPENDENT VECTORS OF RANDOM VARIABLES
And lastly, this is how your IRFs should look:
Good luck!
Chapter 6
Bias and Consistency
6.1 The Analogy Principle
Most of this is in Drew’s slides, but this topic is important. I will cover the analogy principle quickly.
Suppose we have a function β = h(θ) where θ = E[g(y
i
)]. We want to estimate β. We do not know
E[g(y
i
)]. Therefore, we replace θ with
ˆ
θ =
1
n
Σ
n
i=1
y
i
. So
ˆ
β = h(
ˆ
θ).
Let’s go through a simple example using variance. Variance is given by :
σ
2
= E[X
2
] E[X]
2
In practice, replace the expectations with the sample mean. Therefore:
ˆσ
2
=
1
n
Σ
n
i=1
x
2
i
1
n
Σ
n
i=1
x
i
2
We now have a viable estimate for variance.
6.2 Bias
The definition of bias is:
E[
ˆ
θ] θ = bias
If bias = 0, we say that the estimator is unbiased. How does this apply to the analogy principle?
Ideally, we’d like to create unbiased estimators for the parameters we are looking for. Let’s look at the
estimator for the variance that we just found.
First, we should check our two plug-ins to see if they are unbiased.
45
46 CHAPTER 6. BIAS AND CONSISTENCY
E
1
n
Σ
n
i=1
x
2
i
=
1
n
E
n
i=1
x
2
i
]
=
1
n
Σ
n
i=1
E[x
2
i
]
=
1
n
Σ
n
i=1
µ
2
=
1
n
·
2
= µ
2
So that part is unbiased. Let’s check the standard mean:
E
1
n
Σ
n
i=1
x
i
=
1
n
E
n
i=1
x
i
]
=
1
n
Σ
n
i=1
E[x
i
]
=
1
n
Σ
n
i=1
µ
=
1
n
·
= µ
So this is also unbiased. Let’s check them together in the variance estimate:
E[ˆσ
2
] = E
"
1
n
Σ
n
i=1
x
2
i
1
n
Σ
n
i=1
x
i
2
#
= E
1
n
Σ
n
i=1
x
2
i
E
"
1
n
Σ
n
i=1
x
i
2
#
= E
1
n
Σ
n
i=1
x
2
i
E
"
µ µ +
1
n
Σ
n
i=1
x
i
2
#
= E
1
n
Σ
n
i=1
x
2
i
E
"
µ +
1
n
Σ
n
i=1
(x
i
µ)
2
#
= E
1
n
Σ
n
i=1
x
2
i
E
"
µ
2
+ 2µ
1
n
Σ
n
i=1
(x
i
µ) +
1
n
Σ
n
i=1
(x
i
µ)
2
#
6.2. BIAS 47
= E
1
n
Σ
n
i=1
x
2
i
E
µ
2
+ 2µ(¯x µ) + (¯x µ)
2
= µ
2
µ
2
2µ(µ µ) E[(¯x µ)
2
]
= σ
2
V ar(¯x)
= σ
2
V ar
1
n
Σ
n
i=1
x
i
= σ
2
1
n
2
Σ
n
i=1
V ar(x
i
)
= σ
2
1
n
2
·
2
= σ
2
σ
2
n
=
1
1
n
σ
2
Unfortunately, our estimator is biased, where the bias is
1
n
σ
2
. In class, Drew showed you that an
unbiased estimator for variance is:
ˆs
2
=
1
n 1
Σ
n
i=1
[y
i
¯y]
To correct our estimator, we simply add
ˆs
2
n
:
ˆσ
2
unbiased
= ˆσ
2
+
ˆs
2
n
E
ˆσ
2
unbiased
= E
ˆσ
2
+
ˆs
2
n
= E[ˆσ
2
] + E
ˆs
2
n
= σ
2
σ
2
n
+
σ
2
n
= σ
2
Note that sometimes obtaining an unbiased estimator requires sacrificing precision. This trade-
off gives rise to the bias-variance trade-off idea. One way to balance these two is to minimize the
mean-squared error:
m.s.e(
ˆ
θ) = V ar(
ˆ
θ) +
bias(
ˆ
θ)
2
48 CHAPTER 6. BIAS AND CONSISTENCY
6.3 Consistency
Let’s complicate the issue now. When we take asymptotics, we are looking not at how the estimators
behave in expectation, but how they behave as our sample size nears infinity. We have three theorems
that will be helpful in this section.
The Continuous Mapping Theorem says that if x
n
P
x then h(x
n
)
P
h(x), for continuous h(·).
The relationship also holds for convergence in distribution and convergence almost surely.
Slutsky’s Theorem is an extension of the CMT. This theorem states that:
If x
n
d
x and y
n
P
y, then:
(i) x
n
+ y
n
d
x + y
(ii) x
n
y
n
d
xy
(iii)
x
n
y
n
d
x
y
Note the convergence requirements for x and y.
The last theorem is the Delta Method. You will use this theorem over and over again throughout
the year with the Central Limit Theorem. Make sure you know it!
n(h(x
n
) h(x))
d
N(0, H
T
V H)
where H =
u
h(u)
T
.
6.3.1 Application to the Sample Variance
Recall our estimator for σ
2
consisted of two parts:
ˆ
θ =
"
ˆ
θ
1
ˆ
θ
2
#
=
"
1
n
Σ
n
i=1
x
i
1
n
Σ
n
i=1
x
2
i
#
By the Weak Law of Large Numbers
1
n
Σ
n
i=1
x
i
P
E[x]. Similarly, the WLLN means that
1
n
Σ
n
i=1
x
2
i
P
E[x
2
]. Recall the σ
2
estimator:
ˆ
σ
2
=
1
n
Σ
n
i=1
x
2
i
1
n
Σ
n
i=1
x
i
2
The difference of two continuous functions is continuous, so the CMT applies:
6.4. MIDTERM REVIEW 49
1
n
Σ
n
i=1
x
2
i
1
n
Σ
n
i=1
x
i
2
P
E[x
2
] E[x]
2
= σ
2
What we have found is that while the sample variance is not unbiased, it is consistent. We now
need to find the variance of our estimator using the Delta Method formula.
First, we start with V :
V = E
"
x
x
2
#
"
E[x]
E[x
2
]
#! "
x
x
2
#
"
E[x]
E[x
2
]
#!
T
= E
"
x E[x]
x
2
E[x
2
]
#"
x E[x]
x
2
E[x
2
]
#
T
Note that h(θ) = θ
2
θ
2
1
. Now we find H :
H =
θ
h(θ)
=
θ
(θ
2
θ
2
1
)
=
"
2θ
1
1
#
Putting this altogether, our asymptotic variance is:
H
T
V H =
"
2θ
1
1
#
E
"
x E[x]
x
2
E[x
2
]
#"
x E[x]
x
2
E[x
2
]
#
T
"
2θ
1
1
#
T
Therefore, via the Delta Method:
n(ˆσ
2
σ
2
)
d
N
0,
"
2θ
1
1
#
E
"
x E[x]
x
2
E[x
2
]
#"
x E[x]
x
2
E[x
2
]
#
T
"
2θ
1
1
#
T
6.4 Midterm Review
We have covered an advanced probability course in less than a half-semester. The amount of material
you have learned over the last month and a half is not insignificant. In an effort to help organize your
studying for the midterm, I have created a list of the big topics.
50 CHAPTER 6. BIAS AND CONSISTENCY
6.4.1 Basics of Probability
(a) Probability rules
(i) Conditional Probability
(ii) Bayes’ Rule
(iii) Law of Total Probability
(iv) Monte Hall Problem
(b) Properties of the CDF and PDF
(c) Converting between CDFs and PDFs
6.4.2 Distributions
(a) Joint, marginal, and conditional PDFs
(b) Univariate and Multivariate Change of Variables
(c) Calculating expectations, variances, covariances
(d) Law of Total Variance and Law of Iterated Expectations
(e) Distribution kernels and normalizing constants
(f) Moment Generating Functions
6.4.3 Dependent Random Variables
(a) Common representations
(i) MA(q)
(ii) AR(p)
(iii) ARMA(p,q)
(iv) MA() form
(v) VAR(p)
(vi) White Noise assumptions
(b) Unconditional expectations, variances, covariances
(i) From standard form
(ii) From MA() form
(iii) From Companion form
(c) Impulse Response Functions
(d) Autocorrelation Functions
6.4. MIDTERM REVIEW 51
(e) Conditional expectations and variances
(f) Companion Form
(i) Stationarity test
(ii) Formulae for means, variances, covariances
6.4.4 Analogy Principle and Bias
(a) How to apply the Analogy Principle
(b) Formula for bias
(c) Calculating bias
(d) Bias-variance trade-off
Recall that here we are living in "expectations world."
6.4.5 Asymptotics
(a) Convergence in distribution
(b) Convergence in probability
(c) Weak Law of Large Numbers
(d) Central Limit Theorem/Asymptotic Normality
(e) Continuous Mapping Theorem
(f) Slutsky’s Theorem
(g) The Delta Method
Now we are living in asymptopia.”
3
3
Check out this book on estimation.
52 CHAPTER 6. BIAS AND CONSISTENCY
Chapter 7
Asymptotics
7.1 Previous Problem: Gaussian Autoregressive Process
Consider the following autoregression model:
y
t
= µ + ϕ(y
t1
µ) + ε
t
ε
t
i.i.d
N(0, σ
2
)
7.1.1 Part a
Calculate the expected value of the sample mean of y
t
.
Using brute force:
E[¯y
T
] = E
"
1
T
T
X
t=1
y
t
#
=
1
T
T
X
t=1
E[y
t
]
=
1
T
T
X
t=1
µ
= µ
7.1.2 Part b
Write the AR(1) in its MA() representation. Provide an explicit description of the MA() coeffi-
cients ψ
j
.
Using recursive substitution:
53
54 CHAPTER 7. ASYMPTOTICS
y
t
= µ + ϕ (µ + ϕ(y
t2
µ) + ε
t1
µ) + ε
t
= µ + ϕ
2
(y
t2
µ) + ϕε
t1
+ ε
t
= µ + ϕ
2
(µ + ϕ(y
t3
µ) + ε
t2
µ) + ϕε
t1
+ ε
t
= µ + ϕ
3
(y
t3
µ) + ϕ
2
ε
t2
+ ϕε
t1
+ ε
t
We can see the pattern and note that this process will become:
y
t
= µ +
X
j=1
ϕ
j
ε
tj
Therefore the MA weights are ψ
j
= ϕ
j
.
7.1.3 Part c
Consider the following sum:
S =
X
j=0
ψ
j
Provide an explicit expression for S.
This is fairly straightforward if we remember our geometric series:
S =
X
j=0
ψ
j
=
X
j=0
ϕ
j
=
1
1 ϕ
7.1.4 Part d
Consider the variance of the sample mean:
V ar(¯y
T
) = V ar
1
T
T
X
t=1
y
t
!
Provide an explicit expression for the variance. How does this compare to when y
t
is i.i.d.?
7.1. PREVIOUS PROBLEM: GAUSSIAN AUTOREGRESSIVE PROCESS 55
This is painful. But it is good to see at least once.
V ar(¯y
T
) = E[(¯y
t
µ)
2
]
= E
1
T
T
X
t=1
y
t
µ
!
2
= E
1
T
T
X
t=1
(y
t
µ)
!
2
=
1
T
2
E[{(y
1
µ) + (y
2
µ) + ... + (y
T
µ)}{(y
1
µ) + (y
2
µ) + ... + (y
T
µ)}]
=
1
T
2
E[(y
1
µ){(y
1
µ) + (y
2
µ) + ... + (y
T
µ)}
+ (y
2
µ){(y
1
µ) + (y
2
µ) + ... + (y
T
µ)}+
... + (y
T
µ){(y
1
µ) + (y
2
µ) + ... + (y
T
µ)}]
Notice that these are autocovariances. So:
V ar(¯y
T
) =
1
T
2
[γ
0
+ γ
1
+ γ
2
+ ... + γ
T
1
+ γ
1
+ γ
0
+ γ
1
+ ... + γ
T 2
+ ... + γ
T 1
+ γ
T 2
+ γ
T 3
+ ... + γ
0
]
=
1
T
2
[T γ
0
+ 2(T 1)γ
1
+ 2(T 2)γ
2
+ ... + 2γ
T 1
]
Now recall that for the AR(1) model, γ
j
= ϕ
j
γ
0
and γ
0
=
σ
2
1ϕ
2
. Plugging these in:
V ar(¯y
T
) =
1
T
2
[T γ
0
+ 2(T 1)ϕγ
0
+ 2(T 2)ϕ
2
γ
0
+ ... + 2ϕ
T 1
γ
0
]
=
1
T
2
γ
0
[T + 2(T 1)ϕ + 2(T 2)ϕ
2
+ 2ϕ
T 1
]
=
σ
2
T
2
(1 ϕ
2
)
[T + 2(T 1)ϕ + 2(T 2)ϕ
2
+ ... + 2ϕ
T 1
]
This expression is the variance for ¯y
T
. Note that if ϕ = 0, then the process is i.i.d. and we get:
V ar(¯y
T
) =
σ
2
T
This expression is less than the the non-i.i.d. case.
7.1.5 Part e
Under the parametric assumption that the data is normally distributed, what is the exact finite sam-
pling distribution of the sample mean ¯y
T
?
56 CHAPTER 7. ASYMPTOTICS
Sums of normal random variables are normal, so the sampling distribution is:
¯y
T
N
µ, V ar
T · ¯y
T

7.2 Practice Problem 1: Hansen 8.1
Let X be distributed Bernoulli with P (X = 1) = p and P (X = 0) = 1 p.
(a) Show that p = E[X]
(b) Write down the moment estimator ˆp
(c) Find V ar(ˆp)
(d) Find the asymptotic distribution of
n(ˆp p) as n
7.2.1 Part a
We use the old definition of E[X] from way back at the beginning of the semester:
E[X] = p(1) + (1 p)(0)
= p
7.2.2 Part b
Let’s use the analogy principle:
p = E[X]
ˆp =
1
n
n
X
i=1
x
i
7.2.3 Part c
No tricks here either. We’ll use brute force:
7.2. PRACTICE PROBLEM 1: HANSEN 8.1 57
V ar(ˆp) = V ar
1
n
n
X
i=1
x
i
!
Assuming independence:
=
1
n
2
n
X
i=1
V ar(x
i
)
=
1
n
2
n
X
i=1
p(1 p)
=
1
n
p(1 p)
7.2.4 Part d
Using the Central Limit Theorem:
n(ˆp p)
d
N(0, V )
Where we know the mean asymptotically is zero because ˆp converges to p by the WLLN. There are
two ways to find the asymptotic variance here. First is the brute force method:
V = E
(x E[x])
2
= E
x
2
2E[x]x + E[x]
2
= E[x
2
] 2E[x]
2
+ E[x]
2
= E[x
2
] E[x]
2
= p p
2
= p(1 p)
or second we can use:
V = V ar(
nˆp)
= nV ar(ˆp)
= n ·
1
n
p(1 p)
= p(1 p)
Either method should always give the same answer.
58 CHAPTER 7. ASYMPTOTICS
7.3 Practice Problem 2: Hansen 8.3
Find the moment estimator of µ
3
= E[x
3
] and show that
n(ˆµ
3
µ
3
)
d
N(0, V ). In addition, write
V as a function of the moments of X.
7.3.1 Solution
We first use the analogy principle:
µ
3
= E[x
3
]
ˆµ
3
=
1
n
n
X
i=1
x
3
i
Next, we show that ˆµ
3
is consistent. This is straightforward, as the WLLN takes care of it:
ˆµ
3
=
1
n
n
X
i=1
x
3
i
P
E[x
3
] = µ
3
So now we know that
n(ˆµ
3
µ
3
)
d
N(0, V ). Next, we find V :
V = E
h
x
3
E[x
3
]
2
i
= E
h
x
6
2x
3
E[x
3
] + E
x
3
2
i
= E[x
6
] 2E
x
3
2
+ E
x
3
2
= E[x
6
] E
x
3
2
We could have used the same trick as we used in the first practice problem if we wanted to calculate
the sample variance for the estimator.
7.4 Practice Problem 3: Hansen 8.8
Assume that:
n
ˆ
θ
1
θ
1
ˆ
θ
2
θ
2
!
d
N(0, Σ)
Use the Delta Method to find the asymptotic distribution of the following statistics:
(a)
ˆ
θ
1
ˆ
θ
2
(b) e
ˆ
θ
1
+
ˆ
θ
2
7.4. PRACTICE PROBLEM 3: HANSEN 8.8 59
(c) If θ
2
= 0,
ˆ
θ
1
ˆ
θ
2
2
(d)
ˆ
θ
3
1
+
ˆ
θ
1
ˆ
θ
2
2
7.4.1 Part a
Using Slutsky’s Theorem, we know that
ˆ
θ
1
ˆ
θ
2
P
θ
1
θ
2
since
ˆ
θ
1
P
θ
1
and
ˆ
θ
2
P
θ
2
from the set-up
of the problem. So we only need to find the asymptotic variance. Using the Delta Method, we start
with H:
H =
θ
θ
1
θ
2
=
"
θ
2
θ
1
#
We are given that V = Σ, so the asymptotic distribution is:
n(
ˆ
θ
1
ˆ
θ
2
θ
1
θ
2
)
d
N
0,
"
θ
2
θ
1
#
T
Σ
"
θ
2
θ
1
#
7.4.2 Part b
We know that e
ˆ
θ
1
+
ˆ
θ
2
P
e
θ
1
+θ
2
by the Continuous Mapping Theorem since
ˆ
θ
1
+
ˆ
θ
2
P
θ
1
+ θ
2
by
Slutsky’s Theorem and the exponential transformation is continuous. We now find H:
H =
θ
e
θ
1
+θ
2
=
"
e
θ
1
+θ
2
e
θ
1
+θ
2
#
Putting this altogether:
n
e
ˆ
θ
1
+
ˆ
θ
2
d
N
0,
"
e
θ
1
+θ
2
e
θ
1
+θ
2
#
T
Σ
"
e
θ
1
+θ
2
e
θ
1
+θ
2
#
7.4.3 Part c
We know that
ˆ
θ
1
ˆ
θ
2
2
P
θ
1
θ
2
2
by the CMT, as division is continuous as long as the denominator is not zero
(which we are given), since by the CMT
ˆ
θ
2
2
P
θ
2
2
. We now find H:
60 CHAPTER 7. ASYMPTOTICS
H =
θ
θ
1
θ
2
2
=
"
1
θ
2
2
2θ
1
θ
3
2
#
Altogether, we now have:
n
ˆ
θ
1
ˆ
θ
2
2
θ
1
θ
2
2
!
d
N
0,
"
1
θ
2
2
2θ
1
θ
3
2
#
T
Σ
"
1
θ
2
2
2θ
1
θ
3
2
#
7.4.4 Part d
We know that
ˆ
θ
3
1
+
ˆ
θ
1
ˆ
θ
2
2
P
θ
3
1
+ θ
1
θ
2
2
by the CMT since
ˆ
θ
3
1
P
θ
3
1
by the CMT,
ˆ
θ
2
2
P
θ
2
2
by the
CMT, and
ˆ
θ
1
ˆ
θ
2
2
P
θ
1
θ
2
2
by Slutsky’s theorem. We now find H:
H =
θ
θ
3
1
+ θ
1
θ
2
2
=
"
3θ
2
1
+ θ
2
2
2θ
1
θ
2
#
Now pulling this all together:
n
ˆ
θ
3
1
+
ˆ
θ
1
ˆ
θ
2
2
(θ
3
1
+ θ
1
θ
2
2
)
d
N
0,
"
3θ
2
1
+ θ
2
2
2θ
1
θ
2
#
T
Σ
"
3θ
2
1
+ θ
2
2
2θ
1
θ
2
#
7.5 Matlab Checkpoints
The Matlab comes at the end of a long theoretical exercise. Make sure you stay close to the the
theorems and you’ll be fine. To help you out, here are a few checks to make sure your code’s output
is correct.
First, if you use parameter values of α = 2 and δ = 0.5, your true asymptotic variance for your
estimator should be 1.25.
Second, here are some pictures of how your histograms should eventually look:
7.5. MATLAB CHECKPOINTS 61
Each histogram has 50,000 different variance calculations. Hopefully these help you stay on the
right track.
62 CHAPTER 7. ASYMPTOTICS
Chapter 8
Maximum Likelihood Estimation
8.1 Maximum Likelihood Theory
8.1.1 Basics of MLE
The Maximum Likelihood Estimator seeks to estimate a parameter that maximizes the likelihood
function. The likelihood function describes the probability of observing the data that we’ve collected
with parameters as the arguments. Of course, the underlying functions and therefore parameters in
use are chosen by the modeller. The likelihood function is given as:
L
n
(θ|Y ) =
n
Y
i=1
f(y
i
|θ)
where f(y
i
|θ) is the underlying DGP of the data.
To make analysis easier, we often take the natural log of the likelihood function. Note that because
natural log is a monotonically increasing function, the argmax of the log transformation is the same
as the argmax of the original likelihood function.
The log-likelihood function is:
n
(θ) =
n
X
i=1
ln (f (y
i
|θ))
Remember that these functions are functions of θ and we keep y fixed.
To estimate θ, we find the score vector. The score vector is defined as:
s
n
(θ) =
n
(θ)
θ
Note that if θ is k × 1, then s
n
(θ) is also k × 1.
63
64 CHAPTER 8. MAXIMUM LIKELIHOOD ESTIMATION
If the log-likelihood function is differentiable over θ, then we can set the score vector equal to zero
and solve for
ˆ
θ. Recall from your microeconomics class that this FOC is necessary and not sufficient
for determining whether we are at a maximum. As such, we should check SOSCs
2
n
(θ)
θθ
< 0
just
to be sure.
8.1.2 Fisher Information Matrix
The Fisher information matrix conveys how much information the data Y carries about the unknown
parameter θ. It is given by:
I
θ,n
= V ar
n
(θ)
θ
= E
n
(θ)
θ
n
(θ)
θ
Note that if our model is regular (see slide 23 of Topic 7) and correctly specified (see slide 6 of
Topic 7), then the Fisher information matrix equals the expected Hessian:
I
θ,n
= H
θ,n
E
n
(θ)
θ
n
(θ)
θ
= E
2
n
(θ)
θθ
It is useful to remember that the asymptotic distribution of our MLE estimators is:
n(
ˆ
θ
n
θ)
d
N
0, I
1
θ,1
as long as assumptions 1-11 on slide 54 of Topic 7 hold.
If the model is mis-specified, then the asymptotic distribution is:
n(
ˆ
θ
n
θ)
d
N
0, H
1
θ,1
I
θ,1
H
1
θ,1
8.1.3 Cramér-Rao Lower Bound
If a model is regular, parametric, and correctly specified with an interior solution for θ that is unbiased,
then the lowest possible variance is the Cramér-Rao Lower Bound. That is:
V ar(
ˆ
θ
n
) (nI
θ,1
)
1
Note that if our data is i.i.d., then nI
θ,1
= I
θ,n
.
8.2. PRACTICE PROBLEM 1: HANSEN 10.4 65
8.1.4 Estimating the Asymptotic Variance of the Estimator
I’m not going to type all of these out, but you can find the four different ways to estimate the
asymptotic variance at the end of the Topic 7 slides. Pay close attention to the definitions as the
differences between the estimates can be difficult to understand.
8.2 Practice Problem 1: Hansen 10.4
Let X be distributed Pareto with density f(x) =
1
π(1+(xθ)
2
)
for x R.
(a) Find the log-likelihood function of
n
(θ).
(b) Find the first-order condition for the MLE
ˆ
θ for θ. You will not be able to solve for
ˆ
θ.
8.2.1 Part a
We begin by finding the likelihood function. We will then take the natural log of that function. Using
the definition of the likelihood function:
L(θ|x) =
n
Y
i=1
1
π(1 + (x
i
θ)
2
)
n
(θ) =
n
X
i=1
ln
π(1 + (x
i
θ)
2
=
n
X
i=1
ln(π) ln
1 + (x
i
θ)
2
= nln(π)
n
X
i=1
ln
1 + (x
i
θ)
2
8.2.2 Part b
Now that we have the log-likelihood function, we can find the score vector and set that equal to zero:
n
(θ)
θ
=
θ
"
nln(π)
n
X
i=1
ln
1 + (x
i
θ)
2
#
=
n
X
i=1
2(x
i
θ)
1 + (x
i
θ)
2
= 0
Usually we’d like to solve for
ˆ
θ, but note that here we cannot. We need a numerical solver to find
the optimal estimator. So our work, analytically at least, is done.
66 CHAPTER 8. MAXIMUM LIKELIHOOD ESTIMATION
8.3 Practice Problem 2: Hansen 10.7
Take the Pareto model f(x) = αx
1α
, x 1. Calculate the information for α using the second
derivative.
8.3.1 Solution
We start with the likelihood function. We will then take the natural log to find the log-likelihood
function.
L(α|x) =
n
Y
i=1
(αx
1α
i
)
n
(α) =
n
X
i=1
ln(αx
1α
i
)
=
n
X
i=1
ln(α) (1 + α)ln(x
i
)
= nln(α)
n
X
i=1
(1 + α)ln(x
i
)
Now that we have the log-likelihood we can find the score vector:
n
(α)
α
=
α
"
nln(α)
n
X
i=1
(1 + α)ln(x
i
)
#
=
n
α
n
X
i=1
ln(x
i
)
The problem tells us to use the second derivative, so taking the derivative with respect to α again:
2
n
(α)
α
2
=
α
"
n
α
n
X
i=1
ln(x
i
)
#
=
n
α
2
Now that we have the second derivative, we look at the formula for the expected Hessian:
H
θ,n
= E
2
n
(α)
α
2
= E
h
n
α
2
i
=
n
α
2
8.4. PRACTICE PROBLEM 3: HANSEN 10.8 67
Assuming the information matrix equality holds, then:
I
θ,n
=
n
α
2
8.4 Practice Problem 3: Hansen 10.8
Find the Cramér-Rao lower bound for p in the Bernoulli model. In Section 10.3, we derived that the
MLE for p is ˆp =
¯
X
n
. Compute V ar(ˆp). Compare V ar(ˆp) with the Cramér-Rao lower bound.
8.4.1 Finding the Cramér-Rao lower bound
We first note that the Cramér-Rao lower bound is given by (nI
θ,1
)
1
. So we know we need to first
find the likelihood function. The pmf of a Bernoulli random variable is π(x) = p
x
(1 p)
1x
. So:
L(p|x) =
n
Y
i=1
p
x
i
(1 p)
1x
i
n
(p) =
n
X
i=1
ln(p
x
i
(1 p)
1x
i
)
=
n
X
i=1
x
i
ln(p) + (1 x
i
)ln(1 p)
Next, we know we need the score vector, so:
n
(p)
p
=
p
"
n
X
i=1
x
i
ln(p) + (1 x
i
)ln(1 p)
#
=
P
n
i=1
x
i
p
P
n
i=1
(1 x
i
)
1 p
If we assume that the information matrix equality holds (which you’ll show in Hansen 10.6), then
we can use the second derivative:
2
n
(p)
p
2
=
p
P
n
i=1
x
i
p
P
n
i=1
(1 x
i
)
1 p
=
P
n
i=1
x
i
p
2
P
n
i=1
(1 x
i
)
(1 p)
2
Now take the negative expectation:
68 CHAPTER 8. MAXIMUM LIKELIHOOD ESTIMATION
E
2
n
(p)
p
2
= E
P
n
i=1
x
i
p
2
+
P
n
i=1
(1 x
i
)
(1 p)
2
=
1
p
2
n
X
i=1
E[x
i
] +
n
(1 p)
2
P
n
i=1
E[x
i
]
(1 p)
2
=
np
p
2
+
n
(1 p)
2
np
(1 p)
2
=
n
p
+
n(1 p)
(1 p)
2
=
n np + np
p(1 p)
=
n
p(1 p)
Since the information matrix equality is assumed to hold (again, you’ll show this in Hansen 10.6):
I
θ,n
=
n
p(1 p)
and because our data is i.i.d.:
nI
θ,1
=
n
p(1 p)
We have one last step. We must invert this last expression:
(nI
θ,1
)
1
=
p(1 p)
n
This is the Cramér-Rao lower bound.
8.4.2 Variance of the Estimator
We still need to find V ar(ˆp). So:
8.5. MATLAB HELP 69
V ar(ˆp) = V ar
1
n
n
X
i=1
x
i
!
=
1
n
2
V ar
n
X
i=1
x
i
!
=
1
n
2
n
X
i=1
V ar(x
i
)
=
1
n
2
n
X
i=1
p(1 p)
=
n
n
2
p(1 p)
=
p(1 p)
n
Note that this is the Cramér-Rao lower bound.
8.5 Matlab Help
The Matlab this week is tricky again. The analytical part is fairly straightforward - just be sure to
check your algebra carefully! As a hint: define your data in terms of sample means to make the math
more tractable.
On part v, you are asked to derive estimates for the asymptotic covariance matrices. Another
hint: you should arrive at the same matrix for both
ˆ
V
0
and
ˆ
V
1
. If you don’t, go back and rework the
problem.
For the coding section, here are pictures of what your time series plots should look like:
70 CHAPTER 8. MAXIMUM LIKELIHOOD ESTIMATION
As another checkpoint, your robust variance-covariance matrix for the Euro-zone should look like
this:
with your variances for ˆα,
ˆ
β, and ˆσ
2
respectively down the diagonal.
Lastly, the very last section asks you to plot a regression line over the raw data for interest rate
differentials versus return rates. Your end result should look like:
8.5. MATLAB HELP 71
Hopefully these checkpoints and hints are helpful. Good luck on the problem set.
72 CHAPTER 8. MAXIMUM LIKELIHOOD ESTIMATION
Chapter 9
Method of Moments
9.1 Previous Problem 1: Matlab Theory Part
In the previous problem set, Drew gave us a linear regression model:
y
i
= α + βx
i
+ ε
i
ε
i
N(0, σ
2
)
where y
i
N(α + βx
i
, σ
2
). Since y
i
is distributed normally, we know its pdf:
f(y
i
|x
i
) =
1
2πσ
2
e
1
2σ
2
(y
i
αβx
i
)
2
Conditioning on the x
i
, we want to find the Fisher Information matrix.
9.1.1 Conditional Log-Likelihood
Start first with the likelihood, then take the log:
L
n
(θ) =
n
Y
i=1
f(y
i
|x
i
)
n
(θ) =
n
X
i=1
ln
1
2πσ
2
e
1
2σ
2
(y
i
αβx
i
)
2
=
n
2
ln(2πσ
2
)
1
2σ
2
n
X
i=1
(y
i
α βx
i
)
2
9.1.2 Score Vector
Next we take the derivative of the log-likelihood with respect to each of our three parameters:
73
74 CHAPTER 9. METHOD OF MOMENTS
n
(θ)
α
=
1
σ
2
n
X
i=1
(y
i
α βx
i
)
n
(θ)
β
=
1
σ
2
n
X
i=1
(y
i
α βx
i
)x
i
n
(θ)
σ
2
=
n
2σ
2
+
1
2σ
4
n
X
i=1
(y
i
α βx
i
)
2
Let’s simplify one equation at a time, starting with α:
1
σ
2
n
X
i=1
(y
i
α βx
i
) =
1
σ
2
"
n
X
i=1
y
i
β
n
X
i=1
x
i
#
=
1
σ
2
[n¯y βn¯x]
Recall that we set these first-order conditions equal to zero, so we can multiply by σ
2
and add
to get:
= n¯y ¯x
ˆα = ¯y β¯x
Before we can advance further, we must solve for
ˆ
β:
0 =
1
σ
2
n
X
i=1
(y
i
α βx
i
)x
i
=
1
σ
2
n
X
i=1
x
i
y
i
αx
i
βx
2
i
=
1
σ
2
[nxy ¯x xx]
ˆ
β =
xy α¯x
xx
Now we can plug
ˆ
β into ˆα:
9.1. PREVIOUS PROBLEM 1: MATLAB THEORY PART 75
ˆα = ¯y
xy α¯x
xx
¯x
ˆα
1
¯x
2
xx
= ¯y
¯xxy
xx
ˆα
xx ¯x
2
= xx¯y xy¯x
ˆα =
xx¯y xy¯x
xx ¯x
2
We can then plug ˆα back into
ˆ
β:
ˆ
β =
xy
xx¯yxy¯x
xx¯x
2
¯x
xx
=
xy(xx¯x
2
)xx¯y¯x+xy ¯x
2
xx¯x
2
xx
=
xyxx xx¯x¯y ¯x
2
xy + ¯x
2
xy
xx(xx ¯x
2
)
=
xy ¯x¯y
xx ¯x
2
Now we can solve for σ
2
:
0 =
n
2σ
2
+
1
2σ
4
n
X
i=1
(y
i
α βx
i
)
2
n
2σ
2
=
1
2σ
4
n
X
i=1
(y
i
α βx
i
)
2
ˆ
σ
2
=
1
n
n
X
i=1
(y
i
ˆα
ˆ
βx
i
)
2
9.1.3 Deriving the Observed Hessian
The observed Hessian H
o
θ,n
is defined as:
H
o
θ,n
=
2
n
(θ)
θθ
Let’s go one at a time:
76 CHAPTER 9. METHOD OF MOMENTS
2
n
(θ)
α
2
=
n
σ
2
2
n
(θ)
αβ
=
1
σ
2
n
X
i=1
x
i
2
n
(θ)
ασ
2
=
1
σ
4
n
X
i=1
(y
i
α βx
i
)
2
n
(θ)
β
2
=
1
σ
2
n
X
i=1
x
2
i
2
n
(θ)
βσ
2
=
1
σ
4
n
X
i=1
(y
i
α βx
i
)x
i
2
n
(θ)
(σ
2
)
2
=
n
2σ
4
1
σ
6
n
X
i=1
(y
i
α βx
i
)
2
9.1.4 Information Matrix
If we assume the information matrix equality holds, then the expectation of the negative of the observed
Hessian is the same as the information matrix. Recall that we treat x
i
as fixed data so that its mean
is its expectation. Let’s first take the negative expectation of the second derivatives we calculated:
E
2
n
(θ)
α
2
=
n
σ
2
E
2
n
(θ)
αβ
=
n
σ
2
¯x
E
2
n
(θ)
ασ
2
= 0
E
2
n
(θ)
β
2
=
n
σ
2
xx
E
2
n
(θ)
βσ
2
= 0
E
"
2
n
(θ)
(σ
2
)
2
#
=
n
2σ
4
Note that we use both the property that the sum of the residuals is zero and that the sum of
squared residuals is equal to the variance.
So the information matrix is:
I
θ,n
=
n
σ
2
n
σ
2
¯x 0
n
σ
2
¯x
n
σ
2
xx 0
0 0
n
2σ
4
9.2. METHOD OF MOMENTS THEORY 77
So that was a lot of work! Know how to calculate information matrices quickly for the comprehen-
sive exam in June.
9.2 Method of Moments Theory
I will focus on the modern view of the Method of Moments, as Drew refers to it:
Let β be a k ×1 vector of unknown parameters and m(·) be a k ×1 vector of functions that satisfy
the population moment
E[m(y, β)] = 0
The method of moments estimation
ˆ
β
n
is the solution to this system of k unknowns in k equations
(also known as the solution to the moment conditions):
1
n
n
X
i=1
m
y
i
,
ˆ
β
n
= 0
This should look familiar! After all, we are basically using the analogy principle again.
A very simple example: estimating the population mean. Our moment condition is thus:
E[y
i
µ] = 0 Pushing the expectation through:
E[y
i
] µ = 0 Adding the population moment across:
E[y
i
] = µ
Now we use the definition of the method of moments estimator:
1
n
n
X
i=1
y
i
= ˆµ
Which is exactly like the analogy principle. It may be tempting to jump straight to the analogy
principle but really slow down and think through constructing the moment conditions.
Now suppose that we have a function of population moments:
β = h(E[g(y)])
We need to construct moment conditions for each population moment and for the function β. So:
E[m(y, β, θ)] =
"
E[g(y)] θ
h(θ) β
#
=
"
0
0
#
78 CHAPTER 9. METHOD OF MOMENTS
Here, we would find our MoM estimators for each element of θ and then plug them into h to obtain
our MoM plug-in estimator for β.
9.3 Practice Problem 1: Hansen 11.3
A Bernoulli random variable X is
P (X = 0) = 1 p
p(X = 1) = p
(a) Propose a moment estimator ˆp for p.
(b) Find the variance of the asymptotic distribution of
n(ˆp p).
(c) Propose an estimator of the asymptotic variance of ˆp.
9.3.1 Part a
We begin with the moment conditions:
E[x
i
µ] = 0
E[x
i
] = µ
Note that for a Bernoulli random variable µ = p. Use the method of moments estimator:
ˆµ =
1
n
n
X
i=1
x
i
ˆp =
1
n
n
X
i=1
x
i
9.3.2 Part b
Using the WLLN, we know that ˆp
P
p. So we just need to find the variance:
V = E[(x p)(x p)]
= E[(x p)
2
]
= p(1 p)
9.3.3 Part c
This part is really as easy as it seems. Put hats on top of the unknown parameters in our asymptotic
variance:
9.4. PRACTICE PROBLEM 2: HANSEN 11.4 79
\
AV AR = ˆp(1 ˆp)
9.4 Practice Problem 2: Hansen 11.4
Propose a moment estimator
ˆ
λ for the parameter λ of a Poisson distribution.
9.4.1 Solution
Set-up our moment condition:
E[x
i
µ] = 0
E[x
i
] = µ
Using the definition of the MoM estimator:
1
n
n
X
i=1
x
i
= ˆµ
Note that for the Poisson distribution µ = λ, so:
1
n
n
X
i=1
x
i
=
ˆ
λ
9.5 Matlab Help
This is another week of brutal Matlab coding. This time, the homework asks you to code up a Newey-
West Variance estimator and run Monte Carlo simulations with it. The Newey-West estimator is
defined as follows:
ˆ
V
NW
=
ˆ
Γ
0
+
q
X
j=1
k(j) ·
ˆ
Γ
j
+
ˆ
Γ
j
where q is defined as the number of lags included and k(j) = 1
j
q+1
. Of course, we do not know the
true autocovariance matrices, so we estimate them using:
ˆ
Γ
j
=
1
T 1
T
X
t=j+1
(y
t
¯
y)(y
tj
¯
y)
80 CHAPTER 9. METHOD OF MOMENTS
where T is the number of time-series observations. Pay close attention to the indices on the summations
and the subscripts on the variables.
Feel free to drop the number of Monte Carlo simulations to 1000 for computation speed. Remember
that when doing Monte-Carlo simulation, each simulation should have its own Newey-West estimator,
mean, and data set.
Sometimes you’ll receive an error that you cannot Cholesky decompose the Newey-West estimator.
Change the number of lags included in the estimator before rewriting your code just to check if that
will solve the problem. On a side note, the Newey-West estimator has given me trouble every time I
have had to code it so far, so I wish you luck.
Here are some pictures of the histograms and time series I got to guide your journey:
9.5. MATLAB HELP 81
82 CHAPTER 9. METHOD OF MOMENTS
Chapter 10
Hypothesis Testing and Confidence
Intervals
10.1 Previous Problem 1: OLS Estimator
The previous problem set went well, so I will cover a quick question just to review Method of Moments
estimation. Looking at question 5 from PS 9:
Consider the k × 1 vector of population moment conditions
E[x
i
(y
i
x
i
β)] = 0
where β is a k × 1 vector of unknown parameters, y
i
is a scalar and x
i
is a k × 1 vector.
(a) Solve for the method of moments population parameter β
0
.
(b) In the solution to (a), the population method of moments parameter β
0
is only identified
if what condition holds?
(c) Suppose you observe an i.i.d. sample {y
i
, x
i
}
n
i=1
from some unknown "true" joint dis-
tribution f(y
i
, x
i
). Using the analogy principle, propose estimators for the following
population moments:
E[x
i
x
i
]
E[x
i
x
i
]
1
E[x
i
y
i
]
Are the estimators you proposed consistent for the population moments?
(d) Consider again the i.i.d. sample {y
i
, x
i
}
n
i=1
from f(y
i
, x
i
). Given your solution to part
(a), use the analogy principle to propose a method of moments estimator
ˆ
β
n
for the
population parameter β
0
.
83
84 CHAPTER 10. HYPOTHESIS TESTING AND CONFIDENCE INTERVALS
10.1.1 Part a
We first note the moment conditions:
E[x
i
(y
i
x
i
β)] = 0
Now, solve for β
0
:
E[x
i
(y
i
x
i
β)] = 0
E[x
i
y
i
] E[x
i
x
i
]β = 0
E[x
i
y
i
] = E[x
i
x
i
]β
E[x
i
x
i
]
1
E[x
i
y
i
] = β
0
10.1.2 Part b
For there to be one unique solution for β
0
, we need to be able to solve the equation. E[x
i
y
i
] shouldn’t
be a problem, but E[x
i
x
i
]
1
may cause issues. If E[x
i
x
i
] is singular, then we will not be able to find
β
0
. For a matrix to be invertible, it needs to be full rank. Therefore, we need E[x
i
x
i
] to be full rank.
This requirement is known as the rank condition.
10.1.3 Part c
Simply apply the analogy principle:
E[x
i
x
i
]
1
n
n
X
i=1
x
i
x
i
(10.1)
E[x
i
x
i
]
1
1
n
n
X
i=1
x
i
x
i
!
1
(10.2)
E[x
i
y
i
]
1
n
n
X
i=1
x
i
y
i
(10.3)
We know that (1) and (3) converge to the population moment by the WLLN. Because (1) converges
by the WLLN and because the inverse function is continuous over the reals, with the exception of zero
(because we assume the matrix is invertible, this is not a problem), (2) converges by the CMT.
10.1.4 Part d
Simply plug in our estimators from (c):
ˆ
β
n
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
y
i
!
10.2. HYPOTHESIS TESTING 85
We have just derived the OLS estimator using the Method of Moments. We now know that OLS
is both a Maximum Likelihood Estimator and a Method of Moments of estimator.
10.2 Hypothesis Testing
From our undergraduate experience, we should be familiar with hypothesis testing. First, we set up
two competing hypotheses:
H
0
: θ = θ
0
H
A
: θ = θ
0
H
A
here is set-up as a two sided test. We could also theoretically set up a one-sided test (e.g.
H
A
< θ
0
) but, in economics at least, we want to remain open to results on either side of θ
0
.
How do we determine whether we have sufficient evidence to reject H
0
(the null hypothesis)? First,
we must set the level of the test, α. α determines the maximum probability of obtaining a result at
least as extreme as the result we observe that we will accept. In other words, we want to be sure that
in less than α of the draws from the null distribution we obtain our result.
Of course, the null distribution matters here in finite distributions. Since calculating the critical
value (the value of a distribution in which more extreme values consists of α of the area under the
distribution) for some distributions is difficult, and in the real world we may not know the distribution
under the null hypothesis, we turn to asymptotic theory.
Recall the central limit theorem. For i.i.d. X:
n(x µ)
σ
d
N(0, 1)
So let’s make use of this! We can find the critical values associated with the values of α (usually
0.01, 0.05, or 0.10) we choose easily. To find out if our results provide a value beyond the critical
values, we can reject the null.
Remembering that we compare our value with a distribution under the null hypothesis, we find the
following test statistic:
T (y, µ
0
) =
n(x µ
0
)
σ
where µ
0
is the value stated under the null hypothesis. If the null hypothesis is true, the asymptotic
distribution is N(0, 1). But if the null hypothesis is not true, then we will draw a value from a much
different distribution. We compare our t-statistic to the critical value associated with α (1.96 if α =
0.05) and decide whether to reject.
Another way of determining whether to reject our null hypothesis is to construct a confidence
interval. The confidence interval gives us a range of values that, 1-α of the time, will contain the true
86 CHAPTER 10. HYPOTHESIS TESTING AND CONFIDENCE INTERVALS
parameter value. To construct a confidence interval, we calculate:
CI
1α
=
¯x cv ·
σ
n
, ¯x + cv ·
σ
n
Where cv denotes the critical value associated with the asymptotic level.
If our null hypothesis is inside the confidence interval, we cannot reject the null at our α level.
10.3 Other Tests
There are other tests that we can do to reject null hypotheses.
10.3.1 Likelihood Ratio Test
The likelihood ratio test statistic LR(y) for testing H
0
: θ Θ
0
versus H
1
: θ Θ
c
0
is, in logs:
lr(y) = 2
h
ln(L(
ˆ
θ|y) ln(L(
ˆ
θ
0
|y)
i
where L(·) denotes the likelihood function from the MLE section.
The asymptotic distribution of the likelihood ratio is:
lr(y)
d
χ
2
r
where r is the number of restrictions (null hypotheses tested).
10.3.2 Wald Test
The Wald test is as follows:
T
w
(y) = ng(
ˆ
θ)
g(
ˆ
θ)
θ
I
1
ˆ
θ
n
,1
g(
ˆ
θ)
θ
!
!
1
g(
ˆ
θ)
where g(θ) relates the true θ and the null θ
0
. Usually, we like a linear g(θ):
g(θ) = θ θ
0
= 0
as the resulting Wald statistic is the square of the t-test!.
The asymptotic distribution of the Wald test is:
T
w
(y)
d
χ
2
r
10.4. PRACTICE PROBLEM 1: HANSEN 13.3 EXTENDED 87
the same as the Likelihood Ratio test.
10.3.3 Score Test
The score test is as follows:
T
s
(y) =
1
n
n
ˆ
θ
0
n
θ
I
1
ˆ
θ
0
n,1
n
ˆ
θ
0
n
θ
This too has asymptotic distribution:
T
s
(y)
d
χ
2
r
Note that the superscript indicates whether we are using the null value or our estimate when setting
up the tests.
10.4 Practice Problem 1: Hansen 13.3 Extended
Take the exponential model with parameter λ. We want a test for H
0
: λ = 1 against H
A
: λ = 1.
(a) Develop a test based on the sample mean
¯
X
n
.
(b) Find the likelihood ratio statistic.
(c) Find the score test.
(d) Find the Wald statistic.
10.4.1 Part a
We begin with the standard asymptotic t-test:
T (x, µ
0
) =
n(x µ
0
)
ˆs
=
n(x λ
0
)
ˆs
=
n(x 1)
ˆs
This is our asymptotic t-test.
Note that in this scenario, because we know the true distribution, we can actually calculate the
standard deviation:
88 CHAPTER 10. HYPOTHESIS TESTING AND CONFIDENCE INTERVALS
V ar(
n¯x) = nV ar
1
n
n
X
i=1
x
i
!
=
1
n
n
X
i=1
V ar(x
i
)
=
1
n
n
X
i=1
ˆ
λ
2
=
ˆ
λ
2
= ¯x
2
Then our t-test becomes:
T (x) =
n(x 1)
ˆs
=
n(x 1)
¯x
2
=
n(x 1)
¯x
10.4.2 Part b
Using the formula for the likelihood ratio statistic, and noting that the exponential distribution has a
pdf of f (x|λ) =
1
λ
e
x
λ
:
lr(x) = 2
h
ln(L(
ˆ
θ|y) ln(L(
ˆ
θ
0
|y)
i
= 2
"
ln
n
Y
i=1
1
ˆ
λ
e
x
i
ˆ
λ
!
ln
n
Y
i=1
1
1
e
x
i
1
!#
= 2
"
n
X
i=1
ln(
ˆ
λ)
1
ˆ
λ
x
i
n
X
i=1
x
i
#
= 2
nln(
ˆ
λ)
n
ˆ
λ
¯x
n
+ n¯x
n
= 2
¯x
n
n
n
ˆ
λ
nln(
ˆ
λ)
d
χ
2
1
10.4. PRACTICE PROBLEM 1: HANSEN 13.3 EXTENDED 89
10.4.3 Part c
We first need to find the score evaluated at the null hypothesis of λ
0
= 1:
dℓ
n
(λ)
λ=λ
0
=
n
λ
0
+
n
λ
2
0
¯x
n
=
n
1
+
n
1
¯x
n
= n(¯x
n
1)
Assuming the information matrix equality holds, the negative Hessian, and thus the information,
evaluated at the null value is:
I
λ
0
,1
=
1
λ
2
0
= 1
So the score test is:
T
s
(x) =
1
n
n(¯x
n
1) · 1 · n(¯x
n
1)
= n(¯x
n
1)
2
d
χ
2
1
10.4.4 Part d
Last test to derive. Using the constraint g(·) defined above:
g(λ) = λ λ
0
= λ 1
= 0
Now we need the derivative of g(λ) with respect to the true λ:
g(λ)
λ
= 1
Using the Wald statistic formula from above:
90 CHAPTER 10. HYPOTHESIS TESTING AND CONFIDENCE INTERVALS
T
w
(x) = ng(
ˆ
λ
n
)
g(λ)
λ
I
1
ˆ
λ
n
,1
g(λ)
λ
1
g(
ˆ
λ
n
)
= n
ˆ
λ
n
1
·
1
ˆ
λ
2
n
· (
ˆ
λ
n
1)
=
n(
ˆ
λ
n
1)
2
ˆ
λ
n
2
=
n(¯x
n
1)
2
¯x
n
2
d
χ
2
1
This result is the square of our t-test as expected.
10.5 Some Matlab Help
I did not code up your Matlab problem myself this week. Instead, I rely on Drew’s sample code from
last year. Ultimately, you end up finding the values for the likelihood ratio, Wald, and score tests for
a set up data where we assume the true DGP is Poisson. Therefore, your score vectors, Hessians, and
information matrices should be derived from this Poisson distribution.
As checkpoints for you while coding, here are the values you should reach for each test if you set
your random number generator seed to 10:
where the order is likelihood ratio first, Wald second, and score third.
Good luck on the homework!
Chapter 11
Basics of Regression
11.1 Previous Problem 1: Hansen 13.1 Extended
Take the Bernoulli model with probability parameter p. We want a test for H
0
: p = 0.05 against
H
1
: p = 0.05.
(a) Develop a test based on the sample mean ¯x
n
.
(b) Derive the likelihood ratio statistic. What is its asymptotic sampling distribution?
(c) Derive the score test. What is its asymptotic sampling distribution?
(d) Derive the Wald test. What is its asymptotic sampling distribution?
11.1.1 Part a
Because we know that the mean of a Bernoulli random variable is p, our estimator ˆp is simply ¯x
n
. We
use a t-test for (a):
T =
n(¯x
n
p
0
)
ˆs
=
n(¯x
n
0.05)
ˆs
Because we know the distribution of our random variable, we replace ˆs with the estimated true
variance. Bernoulli distributions have variance p(1 p), so our t-test becomes:
91
92 CHAPTER 11. BASICS OF REGRESSION
T =
n(¯x
n
0.05)
ˆs
=
n(¯x
n
0.05)
p
ˆp(1 ˆp)
=
n(¯x
n
0.05)
p
¯x
n
(1 ¯x
n
)
11.1.2 Part b
Our ˆp
0
= 0.05, so using the formula for the likelihood ratio statistic:
lr
n
= 2
"
n
X
i=1
ln
ˆp
y
i
(1 ˆp)
1y
i
n
X
i=1
ln
ˆp
y
i
0
(1 ˆp
0
)
1y
i
#
= 2
"
n
X
i=1
y
i
ln(ˆp) + (1 y
i
)ln(1 ˆp)
n
X
i=1
y
i
ln(0.05) + (1 y
i
)ln(1 0.05)
#
d
χ
2
1
Where the one degree of freedom comes from the one restriction we impose (the null hypothesis).
11.1.3 Part c
To derive the score test, we first need to find the score vector evaluated at the restricted value:
n
(p)
p
p=p
0
=
"
n
X
i=1
y
i
p
n
X
i=1
1 y
i
1 p
#
p=p
0
=
n
X
i=1
y
i
0.05
n
X
i=1
1 y
i
1 0.05
=
n¯y
n
0.05
n n¯y
n
1 0.05
= n¯y
n
20 +
20
19
20n
19
= n¯y
n
400
19
20n
19
We now need the information matrix evaluated under the null hypothesis. Knowing that this
parametric model is correctly specified (due to us knowing the underlying distribution) and regular,
we can use the information matrix equality:
11.1. PREVIOUS PROBLEM 1: HANSEN 13.1 EXTENDED 93
I
ˆp
0
,1
= E
n
(p)
p
p=p
0
= E
¯y
p
2
¯y
(1 p)
2
+
1
(1 p)
2
p=p
0
=
p
p
2
p
(1 p)
2
+
1
(1 p)
2
p=p
0
=
1
p
+
1
1 p
p=p
0
=
1 p + p
p(1 p)
p=p
0
=
1
p(1 p)
p=p
0
=
1
0.05(1 0.05)
=
400
19
We now have all of the pieces that we need. Plugging these into the score statistic formula:
T
s
=
1
n
n¯y
400
19
20n
19
19
400
n¯y
400
19
20n
19
d
χ
2
1
11.1.4 Part d
We first need to set up our g. As I noted last week, the easiest way to do this is linearly:
g(p) = p p
0
= p 0.05
= 0
Next we need to take the derivative of g(p) with respect to p:
g(p)
p
= 1
Taking I
ˆp
0
,1
we estimated above, simply replace p with ˆp and we have the information matrix we
need. Plugging everything into the Wald statistic formula:
94 CHAPTER 11. BASICS OF REGRESSION
T
w
= n(ˆp 0.05)(1 · ˆp(1 ˆp) · 1)
1
(ˆp 0.05)
=
n(ˆp 0.05)
2
ˆp(1 ˆp)
d
χ
2
1
Notice that because ˆp = ¯x
n
and because g(p) is linear, our Wald statistic is simply the square of
our t-statistic.
11.2 Regression Theory
Know the basics for linear regression theory. Putting in the work now will pay-off next semester in
Metrics II. Some of the material discussed below will seem unimportant now but will become important
when proving results in the Spring.
11.2.1 Two Special Matrices
Projection Matrix
Let X be an n × k matrix that is full rank. Then the projection matrix P is an n × n that results
from:
P = X(X
X)
1
X
The projection matrix can be shown to have the following properties:
(i) P X = X
(ii) P = P
(iii) P P = P
(iv) tr(P ) = k and rank(P ) = k
We can use the projection matrix to find estimated ˆy values in our regressions.
Annihilator Matrix
Let X be an n × k matrix that is full rank. Then the projection matrix M is an n × n that results
from:
M = I P
= I X(X
X)
1
X
11.2. REGRESSION THEORY 95
The annihilator matrix can be shown to have the following properties:
(i) MX = 0
(ii) MP = 0
(iii) M = M
(iv) MM = M
(v) tr(M) = n k and rank(M) = n k
We can use the annihilator matrix to remove parts of the regression that we are not interested in
estimating. Why you would do this may not be obvious now but you will see its usefulness when you
cover the Frisch-Waugh-Lovell Theorem. We can also use the annihilator matrix to find residuals.
11.2.2 Normal Linear Regression
The normal linear regression model is as follows:
y
i
= x
i
β + ε
i
ε
i
N(0, σ
2
)
When using normal linear regression, we assume the following:
(i) The true model is linear and E[ε|X] = 0.
(ii) Data {y
i
, x
i
} are i.i.d.
(iii) E[|y
i
|] < and E[||x
i
||
2
] < .
(iv) E[x
i
x
i
] is full rank.
(v) Conditional errors ε
i
|x
i
are Gaussian.
Using maximum likelihood estimation, since we know the underlying DGP is normal, we can find
that our estimators are:
ˆ
β = (x
x)
1
x
y
=
1
n
n
X
i=1
x
i
x
i
!
1
n
n
X
i=1
x
i
y
i
!
ˆσ
2
=
1
n
n
X
i=1
y
i
x
i
ˆ
β
2
=
1
n
n
X
i=1
ˆe
2
i
Using normal linear regression, we can easily derive finite sampling distributions. Usually we don’t
want to assume normally distributed errors. Luckily for us, the best linear predictor follows the same
process! But we will need to appeal to asymptotic theory to conduct our hypothesis testing.
96 CHAPTER 11. BASICS OF REGRESSION
11.2.3 Assumptions on the Error Terms
I will lay out the two main error assumptions here.
Strict Exogenity
Strict exogenity is the strongest assumption we can impose on the data. We have strict exogeneity
when:
E[ε
i
|x
i
] = 0 i
Strict exogenity is nice as it implies that the errors have a mean of zero:
E[ε
i
] = E
x
[E[ε
i
|x
i
]] Using LIE
= E
x
[0]
= 0
In addition, strict exogenity implies that the covariance between x
i
and ε
i
is zero:
Cov(x
i
, ε
i
) = E[x
i
ε
i
] E[x
i
]E[ε
i
]
= E[x
i
ε
i
]
= E
x
E[x
i
ε
i
|x
i
]
= E
x
[x
i
E[ε
i
|x
i
]] Due to the conditional
= E
x
[x
i
· 0]
= 0
So strict exogeneity provides lots of nice properties. But this is a stronger assumption than we
need.
Orthogonality (Exogenous Variables)
We actually only need the orthogonality condition:
E[x
i
ε
i
] = 0
Note that we proved that this result follows naturally from the covariance proof for strict exogeneity.
As such, strict exogeneity implies orthogonality.
11.2. REGRESSION THEORY 97
11.2.4 The OLS Estimator
Sometimes our regression doesn’t have normal errors or a conditional expectation function underlying
the model that is linear. As such, if we still want to use our regression tools, we need to use a best
linear predictor. We use the OLS estimator for this job. Note that if the CEF is non-linear, OLS is
the best linear predictor for the CEF but is not the best predictor for the CEF.
As stated above, our OLS estimator under a non-normal regression is the same as the maximum
likelihood estimator for a normal regression. To see this, we start with minimizing the sum of squared
residuals:
β = argmin
βR
k
E
(y
i
x
i
β)
(y
i
x
i
β)
= E[x
i
(y
i
x
i
β) x
i
(y
i
x
i
β)]
= E[x
i
y
i
+ x
i
x
i
β x
i
y
i
+ x
i
x
i
β]
Because we are minimizing, we set this equal to zero:
0 = 2E[x
i
x
i
β] 2E[x
i
y
i
]
E[x
i
x
i
]β = E[x
i
y
i
]
β = E[x
i
x
i
]
1
E[x
i
y
i
]
To estimate, we employ the analogy principle:
ˆ
β
OLS
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
y
i
!
11.2.5 Consistency and Asymptotic Normality
For most estimators, we want both consistency for obvious reasons and asymptotic normality for
hypothesis testing.
Consistency
If we are going to use OLS, we should make sure that under our assumptions OLS is consistent for the
best linear predictor. To do this, we start from the estimator derived above and proceed to simplify
until we can use our asymptotic theorems: WLLN, CMT, and Slutsky’s. Completing the proof requires
us to recognize the fact that y
i
= x
i
β + ε
i
:
98 CHAPTER 11. BASICS OF REGRESSION
ˆ
β
OLS
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
y
i
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
(x
i
β + ε
i
)
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
x
i
β
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
ε
i
!
= β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
ε
i
!
Taking the limit as n , we can apply the WLLN and CMT to find that:
ˆ
β
OLS
P
β + E[x
i
x
i
]
1
E[x
i
ε
i
]
Recalling our assumption of orthogonality, we know that this becomes:
= β + E[x
i
x
i
]
1
· 0
= β
Therefore:
ˆ
β
OLS
P
β
Asymptotic Normality
So our OLS estimate is consistent. Now we want to ensure that we can easily test hypotheses. So
want to see if it is asymptotically normal. We follow the same set-up steps as we did when proving
consistency:
ˆ
β
OLS
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
y
i
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
(x
i
β + ε
i
)
!
11.2. REGRESSION THEORY 99
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
x
i
β
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
ε
i
!
= β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
ε
i
!
Now, we subtract β to the left-hand side and multiply both sides by
n:
n(
ˆ
β
OLS
β) =
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
ε
i
!
This set-up should look familiar. The left-hand side now looks like the Central Limit Theorem.
We now take advantage of the Multivariate CLT, which says that if we have:
1
n
n
X
i=1
ω
i
E[ω
i
] = 0 V ar(ω
i
) = Σ
then
n
1
n
n
X
i=1
ω
i
!
d
N (0, Σ) .
Note that
1
n
P
n
i=1
x
i
ε
i
meets each of these requirements, so if we take the limit as n :
n(
ˆ
β
OLS
β)
d
E[x
i
x
i
]
1
· N
0, E[x
i
ε
i
ε
i
x
i
]
= N
0, E[x
i
x
i
]
1
E[x
i
ε
i
ε
i
x
i
] E[x
i
x
i
]
1
The asymptotic variance above is the robust asymptotic variance (this simply means that we do
not assume homoskedasticity in the error terms: V ar(ε
i
|x
i
) = E[ε
2
i
|x
i
] = σ
2
(x
i
)).
If we do assume homoskedasticity (V ar(ε
i
|x
i
) = E[ε
2
i
|x
i
] = σ
2
(x
i
) = σ
2
x
i
), then we converge
to:
n(
ˆ
β
OLS
β)
d
N
0, σ
2
E[x
i
x
i
]
1
E[x
i
x
i
] E[x
i
x
i
]
1
= N
0, σ
2
E[x
i
x
i
]
1
Know these proofs backward and forward and know how the assumptions we make play into the
proof. Having a strong foundation built for consistency and asymptotic normality proof will be ex-
tremely helpful for Metrics II.
100 CHAPTER 11. BASICS OF REGRESSION
11.3 Matlab Help
Today’s recitation was full of theory, as there was lots to go through. I wanted to get through a big
chunk of regression theory as I am not sure what the next recitation session will hold (depending on
scheduling I may use the next recitation to prepare for the final). Even still, I think providing some
Matlab help is prudent.
For this next problem set, you get to recreate Stata output in Matlab. This Matlab exercise is
one of the most useful - and cool - problems you will do this semester. To make both this exercise
and, potentially, future exercises easier, create your regression function in a separate file. Create this
function in such a way that it outputs the tables with labels automatically.
In addition to the homoskedastic standard errors, try also including robust standard errors in your
table and compare the numbers. To make sure your output is correct, you can always input the data
into Stata and compare.
Note that to include a constant in your regression, add a vector of ones as a new "variable" to your
data. Here is a snapshot of the inputs I have to my function:
Here is the code I use to generate my tables:
And here is how my table output looks like for Model 2 (the confidence intervals still use ho-
moskedastic standard errors even though I also report robust standard errors):
11.4. PREVIOUS PROBLEM 2: HANSEN II 3.13 101
11.4 Previous Problem 2: Hansen II 3.13
Let D
1
and D
2
be vectors of ones and zeroes, with the i
th
element of D
1
equaling one if that observation
is male and zero if that observation is female (D
2
being the opposite). Then:
(a) In the OLS regression
Y = D
1
ˆγ
1
+ D
2
ˆγ
2
+
ˆ
µ
show that ˆγ
1
is the sample mean of the dependent variable among men in the sample
and that ˆγ
2
is the sample mean among women.
(b) Let X(n×k) be an additional matrix of regressors. Describe in words the transformations
Y
= Y D
1
¯
Y
1
D
2
¯
Y
2
X
= X D
1
¯
X
1
D
2
¯
X
2
Where
¯
X
1
and
¯
X
2
are the k ×1 means of the regressors for men and women, respectively.
(c) Compare
˜
β from the OLS regression
Y
= X
˜
β + ˜e
with the
ˆ
β from the OLS regression
Y = D
1
ˆα
1
+ D
2
ˆα
2
+ X
ˆ
β + ˆe
102 CHAPTER 11. BASICS OF REGRESSION
11.4.1 Part a
We first take the general formula for the OLS estimator:
β = (X
X)
1
(X
Y )
Looking at the OLS equation we are estimating, we see that
X =
h
D
1
D
2
i
Calculating X
X then:
X
X =
"
D
1
D
1
D
1
D
2
D
2
D
1
D
1
D
1
#
=
"
n
1
0
0 n
2
#
We can then invert this by using properties of a diagonal block matrix:
(X
X)
1
=
"
1
n
1
0
0
1
n
2
#
Now we need to find the second part of the OLS estimator:
X
Y =
"
D
1
Y
D
2
Y
#
=
"
P
n
i=1
y
i
1
(d
1,i
= 1)
P
n
i=1
y
i
1
(d
2,i
= 1)
#
Combining the two pieces together, we get:
ˆ
β
OLS
=
"
1
n
1
0
0
1
n
2
#"
P
n
i=1
y
i
1
(d
1,i
= 1)
P
n
i=1
y
i
1
(d
2,i
= 1)
#
=
"
1
n
1
P
n
i=1
y
i
1
(d
1,i
= 1)
1
n
2
P
n
i=1
y
i
1
(d
2,i
= 1)
#
=
"
¯
Y
1
¯
Y
2
#
11.4. PREVIOUS PROBLEM 2: HANSEN II 3.13 103
11.4.2 Part b
The first transformation demeans y so that y
has a sample mean of zero. The second transformation
demeans X so that X
has a sample mean of zero. When running regressions with these variables,
the β coefficients are now deviations from the mean of the data.
11.4.3 Part c
Let’s start from the second OLS regression. We can rewrite this equation as
Y = X
1
ˆ
β
1
+ X
2
ˆ
β
2
+ ˆe
X
1
=
h
D
1
D
2
i
ˆ
β
1
=
"
ˆα
1
ˆα
2
#
X
2
= X
ˆ
β
2
=
ˆ
β
Now, we don’t care about X
1
, so we can use the elimination matrix to simplify the problem. Define
M
1
I X
1
(X
1
X
1
)
1
X
1
. We can now write
ˆ
β
2
as:
ˆ
β
2
= ((M
1
X)
M
1
X)
1
(M
1
X)
M
1
Y
Okay, let’s look at this equation piece-by-piece. First, we look at M
1
Y :
M
1
Y =
I X
1
(X
1
X
1
)
1
X
1
Y
= Y X
1
(X
1
X
1
)
1
X
1
Y
= Y X
1
"
¯
Y
1
¯
Y
2
#
= Y D
1
¯
Y
1
D
2
¯
Y
2
= Y
Now we can look at M
1
X:
M
1
X =
I X
1
(X
1
X
1
)
1
X
1
X
= X X
1
(X
1
X
1
)
1
X
1
X
104 CHAPTER 11. BASICS OF REGRESSION
= X X
1
"
1
n
1
0
0
1
n
2
#"
P
n
i=1
x
i
1
(d
1,i
= 1)
P
n
i=1
x
i
1
(d
2,i
= 1)
#
= X X
1
"
¯
x
1
¯
x
2
#
= X D
1
¯
x
1
D
2
¯
x
2
= X
Now we have all the pieces we need to find
ˆ
β
2
:
ˆ
β
2
= ((M
1
X)
M
1
X)
1
(M
1
X)
M
1
Y
= ((X
)
X
)
1
((X
)
Y
)
Turning to the first OLS regression, we apply our normal formula for the OLS estimator:
˜
β = ((X
)
X
)
1
((X
)
Y
)
which is the same estimator we derived from the second OLS regression. Therefore, the two regressions
deliver the same results for the target β’s.
11.5 Practice Problem 1: Hansen II 4.16 Adapted
Take the linear homoskedastic CEF:
Y
= X
β + e
E[e|X] = 0
E[e
2
|X] = σ
2
and suppose that Y
is measured with error. Instead of Y
, we observe Y = Y
+ u where u is
measurement error. Suppose that e and u are independent and
E[u|X] = 0
E[u
2
|X] = σ
2
u
(X)
(a) Derive an equation for Y as a function of X.
(b) Describe the effect of this measurement error on OLS estimation of β in the feasible
regression of the observed Y on X.
(c) Describe the effect (if any) of the measurement error on the asymptotic variance for
ˆ
β.
11.5. PRACTICE PROBLEM 1: HANSEN II 4.16 ADAPTED 105
11.5.1 Part a
Plug in for Y
in the main OLS equation:
Y
= X
β + e
Y u = X
β + e
Y = X
β + e + u Let η e + u
= X
β + η
11.5.2 Part b
Using the standard estimator for β
OLS
:
ˆ
β =
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
y
i
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
(x
i
β + η
i
)
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
x
i
!
β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
η
i
!
= β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
(e
i
+ u
i
)
!
= β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
(1)
Noting our assumptions that E[e|X] = 0 and E[u|X] = 0, we can derive:
E[xe] = E
x
[xE[e|X]]
= E
x
[x ·0]
= E
x
[0]
= 0
E[xu] = E
x
[xE[u|X]]
= E
x
[x ·0]
= E
x
[0]
= 0
Taking these two results and taking the probability limit of
ˆ
β gives us:
ˆ
β
P
β + E[x
i
x
i
]
1
E[x
i
e
i
] + E[x
i
x
i
]
1
E[x
i
u
i
]
= β
So we see that
ˆ
β is still consistent for β
OLS
.
106 CHAPTER 11. BASICS OF REGRESSION
11.5.3 Part c
Starting from (1) in Part b:
ˆ
β = β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
ˆ
β β =
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
n(
ˆ
β β) =
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
To simplify the resulting expression, define Q
xx
E[x
i
x
i
]. Then as we take n :
n(
ˆ
β β)
d
N
0, Q
1
xx
E[x
i
e
i
e
i
x
i
]Q
1
xx
+ N
0, Q
1
xx
E[x
i
u
i
u
i
x
i
]Q
1
xx
Note our assumption that E[e
2
|X] = σ
2
. Then:
E[e
2
i
] = E
x
i
E[e
2
i
|x
i
]
= E
x
i
[σ
2
]
= σ
2
We can simplify our expression for the asymptotic distribution of
ˆ
β:
n(
ˆ
β β)
d
N
0, σ
2
Q
1
xx
+ N
0, Q
1
xx
E[x
i
u
i
u
i
x
i
]Q
1
xx
n(
ˆ
β β)
d
N
0, σ
2
Q
1
xx
+ Q
1
xx
E[x
i
u
i
u
i
x
i
]Q
1
xx
Note that we cannot simplify the second term as the conditional measurement errors are het-
eroskedastic.
11.6 Practice Problem 2: Hansen II 7.1
Take the model Y = X
1
β
1
+ X
2
β
2
+ e with E[Xe] = 0. Suppose that β
1
is estimated regressing Y on
X
1
only. Find the probability limit of the estimator. In general, is it consistent for β
1
? If not, under
what conditions is this estimator consistent for β
1
?
11.6.1 Solution
We start be defining η X
2
β
2
+ e. Then we have:
11.7. PRACTICE PROBLEM 3: HANSEN II 7.15 107
Y = X
1
β
1
+ η
We now solve from the
ˆ
β
OLS
estimator:
ˆ
β
1
=
1
n
n
X
i=1
x
1i
x
1i
!
1
1
n
n
X
i=1
x
1i
y
i
!
=
1
n
n
X
i=1
x
1i
x
1i
!
1
1
n
n
X
i=1
x
1i
(x
1i
β
1
+ η)
!
=
1
n
n
X
i=1
x
1i
x
1i
!
1
1
n
n
X
i=1
x
1i
x
1i
!
β
1
+
1
n
n
X
i=1
x
1i
x
1i
!
1
1
n
n
X
i=1
x
1i
η
!
= β
1
+
1
n
n
X
i=1
x
1i
x
1i
!
1
1
n
n
X
i=1
x
1i
(x
2i
β
2
+ e
i
)
!
= β
1
+
1
n
n
X
i=1
x
1i
x
1i
!
1
1
n
n
X
i=1
x
1i
x
2i
!
β
2
+
1
n
n
X
i=1
x
1i
x
1i
!
1
1
n
n
X
i=1
x
1i
e
i
!
P
β
1
+ E[x
1i
x
1i
]
1
E[x
1i
x
2i
]β
2
+ E[x
1i
x
1i
]
1
E[x
1i
e
i
]
= β
1
+ E[x
1i
x
1i
]
1
E[x
1i
x
2i
]β
2
If we assume that β
2
= 0, then we see that
ˆ
β
1
P
β
1
. But if β
2
= 0, then
ˆ
β
1
does not converge to
β
1
.
11.7 Practice Problem 3: Hansen II 7.15
Take the linear model Y = Xβ + e with E[e|X] = 0 and X R. Consider the estimator
ˆ
β =
P
n
i=1
x
3
i
y
i
P
n
i=1
x
4
i
Find the asymptotic distribution of
n(
ˆ
β β) as n .
11.7.1 Solution
We take
ˆ
β and work from there:
108 CHAPTER 11. BASICS OF REGRESSION
ˆ
β =
1
n
n
X
i=1
x
4
i
!
1
1
n
n
X
i=1
x
3
i
y
i
!
=
1
n
n
X
i=1
x
4
i
!
1
1
n
n
X
i=1
x
3
i
(x
i
β + e
i
)
!
=
1
n
n
X
i=1
x
4
i
!
1
1
n
n
X
i=1
x
4
i
!
β +
1
n
n
X
i=1
x
4
i
!
1
1
n
n
X
i=1
x
3
i
e
i
!
= β +
1
n
n
X
i=1
x
4
i
!
1
1
n
n
X
i=1
x
3
i
e
i
!
ˆ
β β =
1
n
n
X
i=1
x
4
i
!
1
1
n
n
X
i=1
x
2
i
(x
i
e
i
)
!
Note that the last term is zero in expectation. Therefore, if we multiply both sides by
n we can
apply the multivariate central limit theorem:
n(
ˆ
β β) =
1
n
n
X
i=1
x
4
i
!
1
1
n
n
X
i=1
x
2
i
(x
i
e
i
)
!
d
E[x
4
i
]
1
N
0, E[x
3
i
e
i
e
i
x
3
i
]
= N
0, E[x
4
i
]
1
E[x
3
i
e
i
e
i
x
3
i
]E[x
4
i
]
1
11.8 Practice Problem 4: Hansen II 7.23 Adapted
The model is Y = Xβ + e with E[e|X] = 0 and X R. Consider the estimator:
˜
β =
1
n
n
X
i=1
y
i
x
i
Is
˜
β consistent for β as n ?
11.8.1 Solution
Starting from
˜
β:
11.8. PRACTICE PROBLEM 4: HANSEN II 7.23 ADAPTED 109
˜
β =
1
n
n
X
i=1
y
i
x
i
=
1
n
n
X
i=1
x
i
β + e
i
x
i
=
1
n
n
X
i=1
β +
e
i
x
i
=
n
+
1
n
n
X
i=1
e
i
x
i
= β +
1
n
n
X
i=1
e
i
x
i
P
β + E
e
i
x
i
Recall our assumption that E[e|X] = 0. Then:
E[
e
i
n
] = E
x
i
1
n
E[e
i
|x
i
]
Using LIE
= E
x
i
1
n
· 0
= 0
Then the probability limit of
˜
β becomes:
˜
β
P
β + E
e
i
x
i
= β + 0
= β
110 CHAPTER 11. BASICS OF REGRESSION
Chapter 12
Summary of Econometrics I
12.1 Previous Problem: Hansen II 7.7
Of the variables (Y
, Y, X) only the pair (Y, X) are observed. In this case, we say that Y
is a latent
variable. Suppose
Y
= X
β + e
E[Xe] = 0
Y = Y
+ u
where u is a measurement error satisfying
E[Xu] = 0
E[Y
u] = 0
Let
ˆ
β denote the OLS coefficient from the regression of Y on X.
(a) Is β the coefficient from the linear projection of Y on X?
(b) Is
ˆ
β consistent for β as n ?
(c) Find the asymptotic distribution of
n(
ˆ
β β) as n .
12.1.1 Part a
We use the OLS formula for X on Y and then plug in for Y :
111
112 CHAPTER 12. SUMMARY OF ECONOMETRICS I
β = E[xx
]
1
E[xy]
= E[xx
]
1
E[x(y
+ u)]
= E[xx
]
1
E[xy
] + E[xx
]
1
E[xu]
We know from the assumptions given that E[xu] = 0. Plugging this in:
β = E[xx
]
1
E[xy
] + E[xx
]
1
E[xu]
= E[xx
]
1
E[xy
] + E[xx
]
1
· 0
= E[xx
]
1
E[xy
]
This is the linear projection coefficient of Y
on X. So we see that the β coefficient is both the
estimator for Y
on X and for Y on X.
12.1.2 Part b
We apply the OLS estimator to the observed equation:
ˆ
β =
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
y
i
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
(y
i
+ u
i
)
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
y
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
(x
i
β + e
i
)
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
x
i
!
β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
= β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
(1)
12.2. MAXIMUM LIKELIHOOD ESTIMATION: HANSEN 10.8 113
Now take n :
ˆ
β
P
β + E[x
i
x
i
]
1
E[x
i
e
i
] + E[x
i
x
i
]
1
E[x
i
u
i
]
Given our assumptions that E[x
i
e
i
] = 0 and E[x
i
u
i
] = 0, we find that:
ˆ
β
P
β
So
ˆ
β is consistent for β if our covariates are orthogonal to the measurement error term.
12.1.3 Part c
Starting from equation (1) in part (b):
ˆ
β = β +
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
ˆ
β β =
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
n
ˆ
β β
=
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
e
i
!
+
1
n
n
X
i=1
x
i
x
i
!
1
1
n
n
X
i=1
x
i
u
i
!
Assuming that E[eu] = 0, we can ignore the covariance term between the two errors when finding the
asymptotic variance:
n
ˆ
β β
d
E[x
i
x
i
]
1
N (0, E[x
i
e
i
e
i
x
i
]) + E[x
i
x
i
]
1
N (0, E[x
i
u
i
u
i
x
i
])
= N
0, E[x
i
x
i
]
1
E[x
i
e
i
e
i
x
i
]E[x
i
x
i
]
1
+ N
E[x
i
x
i
]
1
E[x
i
u
i
u
i
x
i
]E[x
i
x
i
]
1
Let Q
xx
E[x
i
x
i
]. Then:
n
ˆ
β β
d
N
Q
1
xx
E[x
i
e
i
e
i
x
i
]Q
1
xx
+ Q
1
xx
E[x
i
u
i
u
i
x
i
]Q
1
xx
12.2 Maximum Likelihood Estimation: Hansen 10.8
Find the Cramér-Rao lower bound for p in the Bernoulli model. In Section 10.3, we derived that the
MLE for p is ˆp =
¯
X
n
. Compute V ar(ˆp). Compare V ar(ˆp) with the Cramér-Rao lower bound.
114 CHAPTER 12. SUMMARY OF ECONOMETRICS I
12.2.1 Finding the Cramér-Rao lower bound
We first note that the Cramér-Rao lower bound is given by (nI
θ,1
)
1
. So we know we need to first
find the likelihood function. The pmf of a Bernoulli random variable is π(x) = p
x
(1 p)
1x
. So:
L(p|x) =
n
Y
i=1
p
x
i
(1 p)
1x
i
n
(p) =
n
X
i=1
ln(p
x
i
(1 p)
1x
i
)
=
n
X
i=1
x
i
ln(p) + (1 x
i
)ln(1 p)
Next, we know we need the score vector, so:
n
(p)
p
=
p
"
n
X
i=1
x
i
ln(p) + (1 x
i
)ln(1 p)
#
=
P
n
i=1
x
i
p
P
n
i=1
(1 x
i
)
1 p
If we assume that the information matrix equality holds, then we can use the second derivative:
2
n
(p)
p
2
=
p
P
n
i=1
x
i
p
P
n
i=1
(1 x
i
)
1 p
=
P
n
i=1
x
i
p
2
P
n
i=1
(1 x
i
)
(1 p)
2
Now take the negative expectation:
E
2
n
(p)
p
2
= E
P
n
i=1
x
i
p
2
+
P
n
i=1
(1 x
i
)
(1 p)
2
=
1
p
2
n
X
i=1
E[x
i
] +
n
(1 p)
2
P
n
i=1
E[x
i
]
(1 p)
2
=
np
p
2
+
n
(1 p)
2
np
(1 p)
2
=
n
p
+
n(1 p)
(1 p)
2
=
n np + np
p(1 p)
=
n
p(1 p)
12.2. MAXIMUM LIKELIHOOD ESTIMATION: HANSEN 10.8 115
Since the information matrix equality is assumed to hold:
I
θ,n
=
n
p(1 p)
and because our data is i.i.d.:
nI
θ,1
=
n
p(1 p)
We have one last step. We must invert this last expression:
(nI
θ,1
)
1
=
p(1 p)
n
This is the Cramér-Rao lower bound.
12.2.2 Variance of the Estimator
We still need to find V ar(ˆp). So:
V ar(ˆp) = V ar
1
n
n
X
i=1
x
i
!
=
1
n
2
V ar
n
X
i=1
x
i
!
=
1
n
2
n
X
i=1
V ar(x
i
)
=
1
n
2
n
X
i=1
p(1 p)
=
n
n
2
p(1 p)
=
p(1 p)
n
Note that this is the Cramér-Rao lower bound.
116 CHAPTER 12. SUMMARY OF ECONOMETRICS I
12.3 Method of Moments Estimation: Hansen 11.3
A Bernoulli random variable X is
P (X = 0) = 1 p
p(X = 1) = p
(a) Propose a moment estimator ˆp for p.
(b) Find the variance of the asymptotic distribution of
n(ˆp p).
(c) Propose an estimator of the asymptotic variance of ˆp.
12.3.1 Part a
We begin with the moment conditions:
E[x
i
µ] = 0
E[x
i
] = µ
Note that for a Bernoulli random variable µ = p. Use the method of moments estimator:
ˆµ =
1
n
n
X
i=1
x
i
ˆp =
1
n
n
X
i=1
x
i
12.3.2 Part b
Using the WLLN, we know that ˆp
P
p. So we just need to find the variance:
V = E[(x p)(x p)]
= E[(x p)
2
]
= p(1 p)
12.3.3 Part c
This part is really as easy as it seems. Put hats on top of the unknown parameters in our asymptotic
variance:
\
AV AR = ˆp(1 ˆp)
12.4. REGRESSION TESTS: HANSEN 13.3 EXTENDED 117
12.4 Regression Tests: Hansen 13.3 Extended
Take the exponential model with parameter λ. We want a test for H
0
: λ = 1 against H
A
: λ = 1.
(a) Develop a test based on the sample mean
¯
X
n
.
(b) Find the likelihood ratio statistic.
(c) Find the score test.
(d) Find the Wald statistic.
12.4.1 Part a
We begin with the standard asymptotic t-test:
T (x, µ
0
) =
n(x µ
0
)
ˆs
=
n(x λ
0
)
ˆs
=
n(x 1)
ˆs
This is our asymptotic t-test.
Note that in this scenario, because we know the true distribution, we can actually calculate the
standard deviation:
V ar(
n¯x) = nV ar
1
n
n
X
i=1
x
i
!
=
1
n
n
X
i=1
V ar(x
i
)
=
1
n
n
X
i=1
ˆ
λ
2
=
ˆ
λ
2
= ¯x
2
Then our t-test becomes:
T (x) =
n(x 1)
ˆs
=
n(x 1)
¯x
2
=
n(x 1)
¯x
118 CHAPTER 12. SUMMARY OF ECONOMETRICS I
12.4.2 Part b
Using the formula for the likelihood ratio statistic, and noting that the exponential distribution has a
pdf of f (x|λ) =
1
λ
e
x
λ
:
lr(x) = 2
h
ln(L(
ˆ
θ|y) ln(L(
ˆ
θ
0
|y)
i
= 2
"
ln
n
Y
i=1
1
ˆ
λ
e
x
i
ˆ
λ
!
ln
n
Y
i=1
1
1
e
x
i
1
!#
= 2
"
n
X
i=1
ln(
ˆ
λ)
1
ˆ
λ
x
i
n
X
i=1
x
i
#
= 2
nln(
ˆ
λ)
n
ˆ
λ
¯x
n
+ n¯x
n
= 2
¯x
n
n
n
ˆ
λ
nln(
ˆ
λ)
d
χ
2
1
12.4.3 Part c
We first need to find the score evaluated at the null hypothesis of λ
0
= 1:
dℓ
n
(λ)
λ=λ
0
=
n
λ
0
+
n
λ
2
0
¯x
n
=
n
1
+
n
1
¯x
n
= n(¯x
n
1)
Assuming the information matrix equality holds, the negative Hessian, and thus the information,
evaluated at the null value is:
I
λ
0
,1
=
1
λ
2
0
= 1
So the score test is:
T
s
(x) =
1
n
n(¯x
n
1) · 1 · n(¯x
n
1)
= n(¯x
n
1)
2
d
χ
2
1
12.5. FINAL 119
12.4.4 Part d
Last test to derive. Using the constraint g(·) defined above:
g(λ) = λ λ
0
= λ 1
= 0
Now we need the derivative of g(λ) with respect to the true λ:
g(λ)
λ
= 1
Using the Wald statistic formula from above:
T
w
(x) = ng(
ˆ
λ
n
)
g(λ)
λ
I
1
ˆ
λ
n
,1
g(λ)
λ
1
g(
ˆ
λ
n
)
= n
ˆ
λ
n
1
·
1
ˆ
λ
2
n
· (
ˆ
λ
n
1)
=
n(
ˆ
λ
n
1)
2
ˆ
λ
n
2
=
n(¯x
n
1)
2
¯x
n
2
d
χ
2
1
This result is the square of our t-test as expected.
12.5 Final
These problems should mostly cover the topics that could be on your final. We have gone through
regression, consistency, asymptotic normality, MLE, MoM, and standard hypothesis testing (look at
confidence intervals before Monday). Know these well and you should be fine on the final.
Some tips: do not leave anything blank. At the very least, sketch out how you would approach
the problem. As always I am available for questions should you have any. If you have a question that
would be difficult to answer over email, feel free to email me to set-up a time to meet or just check to
see if I am in my office.
Good luck on Monday and have a great break!